Datacenter Simulation Methodologies Case Studies

Similar documents
REF: Resource Elasticity Fairness with Sharing Incentives for Multiprocessors

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Datacenter Simulation Methodologies Web Search

Datacenter Simulation Methodologies: Spark

Pocket: Elastic Ephemeral Storage for Serverless Analytics

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

Datacenter Simulation Methodologies Web Search. Tamara Silbergleit Lehman, Qiuyun Wang, Seyed Majid Zahedi and Benjamin C. Lee

Rethinking DRAM Power Modes for Energy Proportionality

A Comparative Study of High Performance Computing on the Cloud. Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez

Key aspects of cloud computing. Towards fuller utilization. Two main sources of resource demand. Cluster Scheduling

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

Staged Memory Scheduling

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica. University of California, Berkeley nsdi 11

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

Datacenter Simulation Methodologies: GraphLab. Tamara Silbergleit Lehman, Qiuyun Wang, Seyed Majid Zahedi and Benjamin C. Lee

Infiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin

BlueDBM: An Appliance for Big Data Analytics*

Phase Change Memory An Architecture and Systems Perspective

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Optimizing Replication, Communication, and Capacity Allocation in CMPs

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

ECE 486/586. Computer Architecture. Lecture # 2

Multi-Core Microprocessor Chips: Motivation & Challenges

Amdahl s Law in the Datacenter Era! A Market for Fair Processor Allocation!

DDR4 LRDIMMs Let You Have It All

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Comparing Memory Systems for Chip Multiprocessors

Memory-Based Cloud Architectures

ibench: Quantifying Interference in Datacenter Applications

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Fundamentals of Quantitative Design and Analysis

FairRide: Near-Optimal Fair Cache Sharing

Master Informatics Eng.

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

ECE 574 Cluster Computing Lecture 23

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

Designing High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches. Onur Mutlu March 23, 2010 GSRC

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES

Towards Deploying Decommissioned Mobile Devices as Cheap Energy-Efficient Compute Nodes

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Hardware Evolution in Data Centers

AMD EPYC PRESENTS OPPORTUNITY TO SAVE ON SOFTWARE LICENSING COSTS

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

GAIL The Graph Algorithm Iron Law

A Comparison of Capacity Management Schemes for Shared CMP Caches

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Amdahl s Law in the Datacenter Era: A Market for Fair Processor Allocation

InfiniBand SDR, DDR, and QDR Technology Guide

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter

The 3D-Memory Evolution

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

3Introduction. Memory Hierarchy. Chapter 2. Memory Hierarchy Design. Computer Architecture A Quantitative Approach, Fifth Edition

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

15-740/ Computer Architecture Lecture 5: Project Example. Jus%n Meza Yoongu Kim Fall 2011, 9/21/2011

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

Dynamic Partitioned Global Address Spaces for Power Efficient DRAM Virtualization

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Data Analytics on RAMCloud

Computer Systems Laboratory Sungkyunkwan University

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

Computer Architecture Lecture 24: Memory Scheduling

Memory Industry Dynamics & Market Drivers Memory Market Segments Memory Products Dynamics Memory Status 2007 Key Challenges for 2008 Future Trends

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Data center day. Non-volatile memory. Rob Crooke. August 27, Senior Vice President, General Manager Non-Volatile Memory Solutions Group

Memory Industry Report T A B L E O F C O N T E N T S

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

Computer Architecture s Changing Definition

Maximizing System x and ThinkServer Performance with a Balanced Memory Configuration

Market Mechanisms for Managing Datacenters with Heterogeneous Microarchitectures

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites

Cloudamize Agents FAQ

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Warehouse- Scale Computing and the BDAS Stack

Structure of Computer Systems. advantage of low latency, read and write operations with auto-precharge are recommended.

Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance

Transcription:

This work is supported by NSF grants CCF-1149252, CCF-1337215, and STARnet, a Semiconductor Research Corporation Program, sponsored by MARCO and DARPA. Datacenter Simulation Methodologies Case Studies Tamara Silbergleit Lehman, Qiuyun Wang, Seyed Majid Zahedi and Benjamin C. Lee

Tutorial Schedule Time Topic 09:00-09:30 Introduction 09:30-10:30 Setting up MARSSx86 and DRAMSim2 10:30-11:00 Break 11:00-12:00 Spark simulation 12:00-13:00 Lunch 13:00-13:30 Spark continued 13:30-14:30 GraphLab simulation 14:30-15:00 Break 15:00-16:15 Web search simulation 16:15-17:00 Case studies 2 / 45

Big Computing and its Challenges Big data demands big computing, yet we face challenges... Architecture Design Systems Management Research Coordination 3 / 45

Toward Energy-Efficient Datacenters Heterogeneity Tailors hardware to software, reducing energy Complicates resource allocation and scheduling Introduces risk Sharing Divides hardware over software, amortizing energy Complicates task placement and co-location Introduces risk 4 / 45

Datacenter Design and Management Heterogeneity for Efficiency Heterogeneous datacenters deploy mix of server- and mobile-class hardware 5 / 45

Datacenter Design and Management Heterogeneity for Efficiency Heterogeneous datacenters deploy mix of server- and mobile-class hardware Heterogeneity and Markets Agents bid for heterogeneous hardware in a market that maximizes welfare 5 / 45

Datacenter Design and Management Heterogeneity for Efficiency Heterogeneous datacenters deploy mix of server- and mobile-class hardware Heterogeneity and Markets Agents bid for heterogeneous hardware in a market that maximizes welfare Sharing and Game Theory Agents share multiprocessors with game-theoretic fairness guarantees 5 / 45

Datacenter Design and Management Heterogeneity for Efficiency Heterogeneous datacenters deploy mix of server- and mobile-class hardware Web search using mobile cores [ISCA 10] Towards energy-proportional datacenter memory with mobile DRAM [ISCA 12] 5 / 45

Mobile versus Server Processors Simpler Datapath Issue fewer instructions per cycle Speculate less often Smaller Caches Provide less capacity Provide lower associativity Slower Clock Reduce processor frequency Atom v. Xeon [Intel] 6 / 45

Applications in Transition Conventional Enterprise Independent requests Memory-, I/O-intensive Ex: web or file server Emerging Datacenter Inference, analytics Compute-intensive Ex: neural network Reddi et al., Web search using mobile cores [ISCA 10] 7 / 45

Web Search Distribute web pages among index servers Distribute queries among index servers Rank indexed pages with neural network Bing.com [Microsoft] 8 / 45

Query Efficiency Joules per second :: 10 on Atom versus Xeon Queries per second :: 2 Queries per Joule :: 5 Reddi et al., Web search using mobile cores [ISCA 10] 9 / 45

Case for Processor Heterogeneity Mobile Core Efficiency Queries per Joule 5 Mobile Core Latency 10% queries exceed cut-off Complex queries suffer Heterogeneity Small cores for simple queries Big cores for complex queries Reddi et al., Web search using mobile cores [ISCA 10] 10 / 45

Memory Architecture and Applications Conventional Enterprise Emerging Datacenter High bandwidth Low bandwidth (< 6% DDR3 peak) Ex: transaction processing Ex: search [Microsoft], memcached [Facebook] 11 / 45

Memory Capacity vs Bandwidth Online Services Use < 6% bandwidth, 65-97% capacity Ex: Microsoft mail, map-reduce, search [Kansal+] Memory Caching 75% of Facebook data in memory Ex: memcached, RAMCloud [Ousterhout+] Capacity-Bandwidth Bundles Server with 4 sockets, 8 channels Ex: 32GB capacity, >100GB/s bandwidth 12 / 45

Mobile-class Memory Operating Parameters Lower active current (130mA vs 250mA) Lower standby current (20mA vs 70mA) Low-power Interfaces No delay-locked loops, on-die termination Lower bus frequency (400 vs 800MHz) Lower peak bandwidth (6.4 vs 12.8GBps) LP-DDR2 vs DDR3 [Micron] 13 / 45

Source of Disproportionality Activity Example 16% DDR3 peak Energy per Bit Large power overheads High cost per bit Calculating memory system power for DDR3 [Micron] 14 / 45

Case for Memory Heterogeneity Mobile Memory Efficiency Bits / Joule 5 Mobile Memory Bandwidth Peak B/W 0.5 Heterogeneity LPDDR for search, memcached DDR for databases, HPC Malladi et al., Towards energy-proportional datacenter memory with mobile DRAM [ISCA 12] 15 / 45

Datacenter Design and Management Heterogeneity and Markets Agents bid for heterogeneous hardware in a market that maximizes welfare Navigating heterogeneous processors with market mechanisms [HPCA 13] Strategies for anticipating risk in heterogeneous system design [HPCA 14] 16 / 45

Datacenter Heterogeneity Systems are heterogeneous Virtual machines are sized Physical machines are diverse Heterogeneity is exposed Users assess machine price Users select machine type Burden is prohibitive Users must understand hardware-software interactions Elastic Compute Cloud (EC2) [Amazon] 17 / 45

Managing Performance Risk Risk: the possibility that something bad will happen Understand Heterogeneity and Risk What types of hardware? How many of each type? What allocation to users? Mitigate Risk with Market Allocation Ensure service quality Hide hardware complexity Trade-off performance and power 18 / 45

Market Mechanism User specifies value for performance Market shields user from heterogeneity Guevara et al. Navigating heterogeneous processors with market mechanisms [HPCA 13] 19 / 45

Proxy Bidding User Provides... Task stream Service-level agreement Proxy Provides... µarchitectural insight Performance profiles Bids for hardware Guevara et al. Navigating heterogeneous processors with market mechanisms [HPCA 13] Wu and Lee, Inferred models for dynamic and sparse hardware-software spaces [MICRO 12] 20 / 45

Visualizing Heterogeneity (2 Processor Types) High performance Low power 8% 7% 6% Ellipses represent hardware types 5% 4% 3% 2% 1% Heterogeneous Homogeneous 0% Points are combinations of processor types Colors show QoS violations Guevara et al. Navigating heterogeneous processors with market mechanisms [HPCA 13] 21 / 45

Further Heterogeneity (4 Processor Types) oo6w24 A F io4w24 L B O G io1w24 C M H io1w10 D 18% Best configuration is heterogeneous 16% 14% 12% 10% QoS violations fall 16% 2% E K J N I 8% 6% 4% 2% Trade-offs motivate design for manageability Guevara et al. Navigating heterogeneous processors with market mechanisms [HPCA 13] Guevara et al. Strategies for anticipating risk in heterogeneous datacenter design [HPCA 14] 22 / 45

Datacenter Design and Management Sharing and Game Theory Agents share multiprocessors with game-theoretic fairness guarantees REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 23 / 45

Case for Sharing Big Servers Hardware is under-utilized Sharing amortized power Heterogeneous Users Tasks are diverse Users are complementary Users prefer flexibility Sharing Challenges Allocate multiple resources Ensure fairness Image: Intel Sandy Bridge E die [www.anandtech.com] 24 / 45

Motivation Alice and Bob are working on research papers Each has $10K to buy computers Alice and Bob have different types of tasks Alice and Bob have different paper deadlines 25 / 45

Strategic Behavior Alice and Bob are strategic Which is better? Small, separate clusters Large, shared cluster Suppose Alice and Bob share Is allocation fair? Is lying beneficial? Image: [www.websavers.org] 26 / 45

Conventional Wisdom in Computer Architecture Users must share Overlooks strategic behavior Fairness policy is equal slowdown Fails to encourage envious users to share Heuristic mechanisms enforce equal slowdown Fail to give provable guarantees 27 / 45

Rethinking Fairness If an allocation is both equitable and Pareto efficient,... it is fair. [Varian, Journal of Economic Theory (1974)] 28 / 45

Resource Elasticity Fairness (REF) REF is an allocation mechanism that guarantees game-theoretic desiderata for shared chip multiprocessors Sharing Incentives Users perform no worse than under equal division Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 29 / 45

Resource Elasticity Fairness (REF) REF is an allocation mechanism that guarantees game-theoretic desiderata for shared chip multiprocessors Sharing Incentives Users perform no worse than under equal division Envy-Free No user envies another s allocation Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 29 / 45

Resource Elasticity Fairness (REF) REF is an allocation mechanism that guarantees game-theoretic desiderata for shared chip multiprocessors Sharing Incentives Users perform no worse than under equal division Envy-Free No user envies another s allocation Pareto-Efficient No other allocation improves utility without harming others Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 29 / 45

Resource Elasticity Fairness (REF) REF is an allocation mechanism that guarantees game-theoretic desiderata for shared chip multiprocessors Sharing Incentives Users perform no worse than under equal division Envy-Free No user envies another s allocation Pareto-Efficient No other allocation improves utility without harming others Strategy-Proof No user benefits from lying Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 29 / 45

Cobb-Douglas Utility u(x) = R r=1 xα r r u x r α r utility (e.g., performance) allocation for resource r (e.g., cache size) elasticity for resource r Cobb-Douglas fits preferences in computer architecture Exponents model diminishing marginal returns Products model substitution effects Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 30 / 45

Example Utilities u 1 = x 0.6 1 y0.4 1 u 2 = x 0.2 2 y0.8 2 u 1, u 2 x 1, x 2 y 1, y 2 performance allocated memory bandwidth allocated cache size Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 31 / 45

Possible Allocations 2 users 12MB cache 24GB/s bandwidth Cache Size 24 20 15 10 5 0 user 2 12 0 10 2 8 4 6 6 4 8 2 10 0 12 user 0 5 10 15 20 24 1 Memory Bandwidth Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 32 / 45

Envy-Free (EF) Allocations Identify EF allocations for each user u 1 (A 1 ) u 1 (A 2 ) u 2 (A 2 ) u 2 (A 1 ) Cache Size Cache Size 24 20 15 10 5 0 12 0 10 2 8 EF Region 4 6 6 4 8 2 10 0 12 0 5 10 15 20 24 Memory Bandwidth 24 20 15 10 5 0 12 0 10 2 8 4 6 6 4 8 EF Region 2 10 0 12 0 5 10 15 20 24 Memory Bandwidth Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 33 / 45

Pareto-Efficient (PE) Allocations No other allocation improves utility without harming others Cache Size 24 20 15 10 5 0 12 0 10 Contract Curve 2 8 4 6 6 4 8 2 10 0 12 0 5 10 15 20 24 Memory Bandwidth Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 34 / 45

Fair Allocations Fairness = Envy-freeness + Pareto efficiency Cache Size 24 20 15 10 5 0 12 0 10 2 8 Fair Allocations 4 6 6 4 8 2 10 0 12 0 5 10 15 20 24 Memory Bandwidth Many possible fair allocations! Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 35 / 45

Mechanism for Resource Elasticity Fairness Profile preferences Fit utility function Normalize elasticities Guarantees desiderata Sharing incentives Envy-freeness Pareto efficiency Strategy-proofness Allocate proportionally Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 36 / 45

Profiling for REF Profile preferences Fit utility function Normalize elasticities Allocate proportionally Off-line profiling Synthetic benchmarks Off-line simulations Various hardware Machine learning α = 0.5, then update Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 37 / 45

Fitting Utilities Profile preferences Fit utility function u = R r=1 xαr r log(u) = R r=1 α r log(x r ) Normalize elasticities Use linear regression to find α r Allocate proportionally Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 38 / 45

Cobb-Douglas Accuracy Ferret Sim. Ferret Est. Fmm Sim. Fmm Est. 1.2 1.0 0.8 IPC 0.6 0.4 0.2 0.0 Utility is instructions per cycle Resources are cache size, memory bandwidth Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 39 / 45

Normalizing Utilities Profile preferences Fit utility function Normalize elasticities Compare users elasticities on same scale u = x 0.2 y 0.3 u = x 0.4 y 0.6 Allocate proportionally Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 40 / 45

Allocating Proportional Shares Profile preferences u 1 = x 0.6 1 y0.4 1 u 2 = x 0.2 2 y0.8 2 Fit utility function Normalize elasticities ( ) x 1 = 0.6 0.6+0.2 24 = 18GB/s ( ) x 2 = 0.2 0.6+0.2 24 = 6GB/s Allocate proportionally Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 41 / 45

Equal Slowdown versus REF Resource Allocation (% of Total Capacity) Equal slow-down provides neither SI nor EF Canneal receives < half of cache, memory Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 42 / 45

Equal Slowdown versus REF Resource Allocation (% of Total Capacity) Resource Allocation (% of Total Capacity) Equal slow-down provides neither SI nor EF Canneal receives < half of cache, memory Resource elasticity fairness provides both SI and EF Canneal receives more cache, less memory Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 42 / 45

Performance versus Fairness Weighted System Throughput Max Welfare w/o Fairness Equal Slowdown w/o Fairness 4 3.5 3 2.5 2 1.5 1 0.5 0 Max Welfare w/ Fairness Proportional Elasticity w/ Fairness WD1 (4C) WD2 (2C-2M) WD3 (4M) WD4 (3C-1M) WD5 (1C-3M) Measure weighted instruction throughput REF incurs < 10% penalty Zahedi et al. REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 43 / 45

Datacenter Design and Management Heterogeneity for Efficiency Heterogeneous datacenters deploy mix of server- and mobile-class hardware Web search using mobile cores [ISCA 10] Towards energy-proportional datacenter memory with mobile DRAM [ISCA 12] Heterogeneity and Markets Agents bid for heterogeneous hardware in a market that maximizes welfare Navigating heterogeneous processors with market mechanisms [HPCA 13] Strategies for anticipating risk in heterogeneous system design [HPCA 14] Sharing and Game Theory Agents share multiprocessors with game-theoretic fairness guarantees REF: Resource elasticity fairness with sharing incentives for multiprocessors [ASPLOS 14] 44 / 45

Datacenter Design and Management Heterogeneity for Efficiency Heterogeneous datacenters deploy mix of server- and mobile-class hardware Processors hardware counters for CPI stack Memories simulator for cache, bandwidth activity Heterogeneity and Markets Agents bid for heterogeneous hardware in a market that maximizes welfare Processors simulator for core performance Server Racks queueing models (e.g., M/M/1) Sharing and Game Theory Agents share multiprocessors with game-theoretic fairness guarantees Memories simulator for cache, bandwidth utility 45 / 45