A Probabilistic Graphical Model-based Approach for Minimizing Energy under Performance Constraints
|
|
- Michael Cameron
- 5 years ago
- Views:
Transcription
1 A Probabilistic Graphical Model-based Approach for Minimizing Energy under Performance Constraints Nikita Mishra, Huazhe Zhang, John Lafferty and Hank Hoffmann University of Chicago
2 Fraction of time CPU utilization CPU utilization Average CPU utilization of more than 5,000 servers during 6-month period [1] [1]Barroso, Luiz André, and Urs Hölzle. "The case for energy-proportional computing." IEEE computer (2007):
3 Example of a configuration space 2.26 Hz Clock Speed Memory Controller 1 Memory Controller 2 Cores Memory controller 3
4 Adaptive systems Automatically tune configurations for different utilizations to achieve most energy efficient state 4
5 Adaptive systems Automatically tune configurations for different utilizations to achieve most energy efficient state Requires the power and performance profile for the application 4
6 Why is it a difficult problem? 5
7 Why is it a difficult problem? Configuration space can be quite large. With brute force it may take a lot of time. 5
8 Why is it a difficult problem? Configuration space can be quite large. With brute force it may take a lot of time. The behavior of each application is different for different machine. 5
9 Why is it a difficult problem? Configuration space can be quite large. With brute force it may take a lot of time. The behavior of each application is different for different machine. The application behavior could even vary with different input. E.g. (Video streaming application x264) 5
10 Cores Example: streamcluster Performance rate (in iter/s) Clock speed A contour plot of performance rate (in iter/s) for streamcluster benchmark at different configurations 6
11 Cores Example: streamcluster Performance rate (in iter/s) 8 Clock speed A contour plot of performance rate (in iter/s) for streamcluster benchmark at different configurations 6
12 Cores Example: streamcluster Performance rate (in iter/s) Multiple local solutions 8 Clock speed A contour plot of performance rate (in iter/s) for streamcluster benchmark at different configurations 6
13 Example: kmeans Optimal configuration frontier Pareto frontier of Performance rate (in Iter/s) vs system-power(in Watts) at different configurations 7
14 LEO (Learning for Energy Optimization) Historical Data Target Application 8
15 LEO (Learning for Energy Optimization) Historical Data Target Application Incorporate performance profiles of previously seen applications 8
16 Example: kmeans Performance rate (in Iter/s) vs Configuration index Estimated Pareto-optimal frontiers vs true frontier found with exhaustive search 9
17 Motivation/Overview Statistical modelling Evaluation Summary Outline 10
18 Outline Statistical modelling 10
19 Outline Statistical modelling Graphical Models Hierarchical Bayesian model Expectationmaximization algorithm 10
20 Outline Statistical modelling Graphical Models Hierarchical Bayesian model Expectationmaximization algorithm 10
21 Outline Statistical modelling Graphical Models Hierarchical Bayesian model Expectationmaximization algorithm 10
22 Outline Statistical modelling Graphical Models Hierarchical Bayesian model Expectationmaximization algorithm 10
23 Graphical Models z1 z2 zm -1 zm y1 y2 ym -1 ym yi: Vector of performance rate by the i th application for different configurations. 11
24 Graphical Models z1 z2 zm -1 zm y1 y2 ym -1 ym yi: Vector of performance rate by the i th application for different configurations. 11
25 Graphical Models z1 z2 zm -1 zm y1 y2 ym -1 ym yi: Vector of performance rate by the i th application for different configurations. 11
26 Hierarchical Bayesian Model Hidden Nodes, z1 z2 zm -1 zm All applications (Observed data) y1 y2 ym -1 ym yi: Vector of performance rate by the i th application for different configurations. 12
27 Hierarchical Bayesian Model Hidden Nodes, z1 z2 zm -1 zm All applications (Observed data) y1 y2 ym -1 ym Target Application (Partially observed data) yi: Vector of performance rate by the i th application for different configurations. 12
28 Hierarchical Bayesian Model Hidden Nodes, Couples each of the applications z1 z2 zm -1 zm All applications (Observed data) y1 y2 ym -1 ym Target Application (Partially observed data) yi: Vector of performance rate by the i th application for different configurations. 12
29 Hierarchical Bayesian Model Hidden Nodes, z1 z2 zm -1 zm Couples each of the applications Penalizes large variations in the application All applications (Observed data) y1 y2 ym -1 ym Target Application (Partially observed data) yi: Vector of performance rate by the i th application for different configurations. 12
30 Hierarchical Bayesian Model Hidden Nodes, z1 z2 zm -1 zm All applications (Observed data) y1 y2 ym -1 ym yi: Vector of performance rate by the i th application for different configurations. 12
31 Hierarchical Bayesian Model Hidden Nodes, z1 z2 zm -1 zm True value of target application All applications (Observed data) y1 y2 ym -1 ym yi: Vector of performance rate by the i th application for different configurations. 13
32 Expectation Maximization Algorithm Model Parameters Latent variables Initialize 14
33 Expectation Maximization Algorithm Model Parameters Latent variables Ɵnew= Initialize Initialize 14
34 Expectation Maximization Algorithm Model Parameters Latent variables Ɵnew= Initialize Initialize = E-step Create Expected log-likelihood function 14
35 Expectation Maximization Algorithm Model Parameters Latent variables Ɵnew= M-step Maximize Initialize Expected Initialize log-likelihood function = E-step Create Expected log-likelihood function 14
36 Expectation Maximization Algorithm Model Parameters Ɵnew Latent variables Ɵnew= M-step Maximize Initialize Expected Initialize log-likelihood function Observed data = E-step Create Expected log-likelihood function 14
37 Performance (in Iter/s) Example: kmeans (Initialization) Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
38 Performance (in Iter/s) Example: kmeans (Initialization) Observed Samples Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
39 Performance (in Iter/s) Example: kmeans (EM Iteration - 1) Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
40 Performance (in Iter/s) Example: kmeans (EM Iteration - 2) Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
41 Performance (in Iter/s) Example: kmeans (EM Iteration - 3) Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
42 Performance (in Iter/s) Example: kmeans (EM Iteration - 4) Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
43 Performance (in Iter/s) Example: kmeans (EM Iteration - 4) Cores Different iterations of EM algorithm for estimating performance rate (in Iter/s) vs Cores 15
44 LEO (Learning for Energy Optimization) Set ym = Observed Power LEO Get p = Estimated Power Feedback! Controller Select the configuration LEO Set ym = Observed Performance Get r = Estimated Performance 16
45 LEO (Learning for Energy Optimization) Set ym = Observed Power LEO Get p = Estimated Power Feedback! Controller Select the configuration LEO Set ym = Observed Performance Get r = Estimated Performance 16
46 LEO (Learning for Energy Optimization) Set ym = Observed Power LEO Get p = Estimated Power Feedback! Controller Select the configuration LEO Set ym = Observed Performance Get r = Estimated Performance 16
47 Motivation/Overview Statistical modelling Evaluation Experimental Setup Power and performance estimation Energy savings/ Phase transition Summary Outline 17
48 Outline Evaluation Experimental Setup 17
49 Outline Evaluation Experimental Setup Dual-socket Linux system with SuperMICRO X9DRL-iF motherboard and two Intel Xeon E processors 17
50 Experimental Setup Configurations (1024 configurations) 18
51 Configurations (1024 configurations) Clock speed: Experimental Setup Set using cpufrequtils package 15 DVFS settings (from 1.2 { 2.9 GHz) + TurboBoost - 16 settings 18
52 Configurations (1024 configurations) Clock speed: Set using cpufrequtils package 15 DVFS settings (from 1.2 { 2.9 GHz) + TurboBoost - 16 settings Memory controller: Experimental Setup numactl library to control the access. 2 memory controls - 2 settings 18
53 Configurations (1024 configurations) Clock speed: Set using cpufrequtils package 15 DVFS settings (from 1.2 { 2.9 GHz) + TurboBoost - 16 settings Memory controller: numactl library to control the access. 2 memory controls - 2 settings Cores: Experimental Setup Two 8 cores and hyper-threading - 32 settings 18
54 Configurations (1024 configurations) Clock speed: Set using cpufrequtils package 15 DVFS settings (from 1.2 { 2.9 GHz) + TurboBoost - 16 settings Memory controller: numactl library to control the access. 2 memory controls - 2 settings Cores: Two 8 cores and hyper-threading - 32 settings Measurements Experimental Setup 18
55 Configurations (1024 configurations) Clock speed: Set using cpufrequtils package 15 DVFS settings (from 1.2 { 2.9 GHz) + TurboBoost - 16 settings Memory controller: numactl library to control the access. 2 memory controls - 2 settings Cores: Two 8 cores and hyper-threading - 32 settings Measurements Power Experimental Setup WattsUp meter provides total system power at 1s intervals. 18
56 Configurations (1024 configurations) Clock speed: Set using cpufrequtils package 15 DVFS settings (from 1.2 { 2.9 GHz) + TurboBoost - 16 settings Memory controller: numactl library to control the access. 2 memory controls - 2 settings Cores: Two 8 cores and hyper-threading - 32 settings Measurements Power WattsUp meter provides total system power at 1s intervals. Performance Experimental Setup Applications report the heartrate, which is application specific. 18
57 Benchmarks Experimental Setup 19
58 Experimental Setup Benchmarks We use 25 benchmarks from 3 different suites, PARSEC, Minebench, Rodinia and some others. 19
59 Experimental Setup Benchmarks We use 25 benchmarks from 3 different suites, PARSEC, Minebench, Rodinia and some others. Baseline heuristics 19
60 Experimental Setup Benchmarks We use 25 benchmarks from 3 different suites, PARSEC, Minebench, Rodinia and some others. Baseline heuristics Online algorithm- Polynomial multivariate regression over configuration values on the observed dataset. 19
61 Experimental Setup Benchmarks We use 25 benchmarks from 3 different suites, PARSEC, Minebench, Rodinia and some others. Baseline heuristics Online algorithm- Polynomial multivariate regression over configuration values on the observed dataset. Offline algorithm- Average over the rest of the applications to estimate the power and performance of the given application. 19
62 Experimental Setup Benchmarks We use 25 benchmarks from 3 different suites, PARSEC, Minebench, Rodinia and some others. Baseline heuristics Online algorithm- Polynomial multivariate regression over configuration values on the observed dataset. Offline algorithm- Average over the rest of the applications to estimate the power and performance of the given application. Race-to-idle- Allocates all resources to the application and once it is finished the system goes to idle. 19
63 Motivation/Overview Statistical modelling Evaluation Experimental setup Power and performance estimation Energy savings/ Phase transition Summary Outline 20
64 Power and performance estimation Performance rate (in Iter/s) vs Configuration index System-power (in Watts) vs Configuration index 21
65 Power and performance estimation Swish Search web- server X264 Video encoder 22
66 ACCURACY Summary: Performance estimation LEO Online Offline Kmeans LEO Online Offline
67 ACCURACY Summary: Performance estimation LEO Online Offline Jacobi LEO Online Offline
68 ACCURACY Summary: Performance estimation LEO Online Offline Overall LEO Online Offline
69 ACCURACY Summary: System-power estimation LEO Online Offline Overall LEO Online Offline
70 Motivation/Overview Statistical modelling Experiments Experimental setup Power and performance estimation Energy savings/ Phase transition Summary Outline 27
71 Summary: Energy savings Comparison of average energy compared with the optimal (over different utilizations and all the benchmarks), LEO - +6% Online - +24% Offline - +29% Race-to idle - +90% 28
72 Phase - transitions Performance and power for fluidanimate along phases with different computational demands 29
73 Phase - transitions Performance and power for fluidanimate along phases with different computational demands 29
74 Multiple Applications Comparison of performance estimation(in iter/s) and system-power(in Watts) estimation for different algorithms over the set of mixture of applications Performance(in Iter/s) System-power(in Watts) Mixture 1 Mixture 2 Overall Mixture 1 Mixture 2 Overall LEO Online Offline
75 Summary
76 Sensitivity analysis of LEO vs Online As compared to LEO which quickly reaches near optimality, our baseline method (online regression) cannot perform below 15 samples because the design matrix of regression model would be rank deficient. 32
77 Related Work Offline optimization techniques (e.g.,[59, 35, 33, 10, 2]) But they are limited by reliance on a robust training phase. Online optimization techniques [44] For example, Flicker is a configurable architecture and optimization framework that uses only online models to maximize performance under a power limitation. ParallelismDial, Uses online adaptation to tailor parallelism to application workload. 33
COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings
More informationTHE UNIVERSITY OF CHICAGO STATISTICAL METHODS FOR PERFORMANCE ESTIMATION FOR IMPROVING SCHEDULING AND ENERGY MINIMIZATION A DISSERTATION SUBMITTED TO
THE UNIVERSITY OF CHICAGO STATISTICAL METHODS FOR PERFORMANCE ESTIMATION FOR IMPROVING SCHEDULING AND ENERGY MINIMIZATION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCE
More informationEvaluating the Effectiveness of Model Based Power Characterization
Evaluating the Effectiveness of Model Based Power Characterization John McCullough, Yuvraj Agarwal, Jaideep Chandrashekhar (Intel), Sathya Kuppuswamy, Alex C. Snoeren, Rajesh Gupta Computer Science and
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationPower-Aware Computing with Dynamic Knobs Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-00-07 May 4, 00 Power-Aware Computing with Dynamic Knobs Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic,
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationCOL862 Programming Assignment-1
Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,
More informationPARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites Christian Bienia (Princeton University), Sanjeev Kumar (Intel), Kai Li (Princeton University) Outline Overview What
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationPower-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,
More informationDerivative Delay Embedding: Online Modeling of Streaming Time Series
Derivative Delay Embedding: Online Modeling of Streaming Time Series Zhifei Zhang (PhD student), Yang Song, Wei Wang, and Hairong Qi Department of Electrical Engineering & Computer Science Outline 1. Challenges
More informationAccurate Characterization of the Variability in Power Consumption in Modern Mobile Processors
Accurate Characterization of the Variability in Power Consumption in Modern Mobile Processors Bharathan Balaji John McCullough, Rajesh Gupta, Yuvraj Agarwal Computer Science and Engineering, UC San Diego
More informationCross-layer Optimization for Virtual Machine Resource Management
Cross-layer Optimization for Virtual Machine Resource Management Ming Zhao, Arizona State University Lixi Wang, Amazon Yun Lv, Beihang Universituy Jing Xu, Google http://visa.lab.asu.edu Virtualized Infrastructures,
More informationTowards Energy Proportionality for Large-Scale Latency-Critical Workloads
Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012
More informationModeling CPU Energy Consumption for Energy Efficient Scheduling
Modeling CPU Energy Consumption for Energy Efficient Scheduling Abhishek Jaiantilal, Yifei Jiang, Shivakant Mishra University of Colorado - Boulder GCM '10 Proceedings of the 1st Workshop on Green Computing
More informationThread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications
Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Janghaeng Lee, Haicheng Wu, Madhumitha Ravichandran, Nathan Clark Motivation Hardware Trends Put more cores
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationEnergy-centric DVFS Controlling Method for Multi-core Platforms
Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah Abstract Goal To
More informationDynamic Knobs for Responsive Power-Aware Computing
Dynamic Knobs for Responsive Power-Aware Computing Henry Hoffmann Stelios Sidiroglou Michael Carbin Sasa Misailovic Anant Agarwal Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts
More informationTopics. CIT 470: Advanced Network and System Administration. Google DC in The Dalles. Google DC in The Dalles. Data Centers
CIT 470: Advanced Network and System Administration Data Centers Topics Data Center: A facility for housing a large amount of computer or communications equipment. 1. Racks 2. Power 3. PUE 4. Cooling 5.
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationManaging Performance vs. Accuracy Trade-offs With Loop Perforation
Managing Performance vs. Accuracy Trade-offs With Loop Perforation Stelios Sidiroglou Sasa Misailovic Henry Hoffmann Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts
More informationECE 571 Advanced Microprocessor-Based Design Lecture 7
ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2017 Announcements HW#4 will be posted, some readings 1 Measuring
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationPerformance, Power, Die Yield. CS301 Prof Szajda
Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the
More informationStatistical Performance Comparisons of Computers
Tianshi Chen 1, Yunji Chen 1, Qi Guo 1, Olivier Temam 2, Yue Wu 1, Weiwu Hu 1 1 State Key Laboratory of Computer Architecture, Institute of Computing Technology (ICT), Chinese Academy of Sciences, Beijing,
More informationOptimization of Behavioral IPs in Multi-Processor System-on- Chips
Optimization of Behavioral IPs in Multi-Processor System-on- Chips Yidi Liu and Benjamin Carrion Schafer # Department of Electronic and Information Engineering b.carrionschafer@polyu.edu.hk # Outline High-Level
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationA Simple Model for Estimating Power Consumption of a Multicore Server System
, pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of
More informationCOL862 - Low Power Computing
COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted
More informationEnergy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012
Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationSoftware within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual
Software within building physics and ground heat storage HEAT3 version 7 A PC-program for heat transfer in three dimensions Update manual June 15, 2015 BLOCON www.buildingphysics.com Contents 1. WHAT S
More informationMyths in PMC-based Power Estimation. Jason Mair, Zhiyi Huang, David Eyers, and Haibo Zhang
Myths in PMC-based Power Estimation Jason Mair, Zhiyi Huang, David Eyers, and Haibo Zhang Outline PMC-based power modeling Experimental setup and configuration Myth 1: Sample rate Myth 2: Thermal effects
More informationIntroduction to Trajectory Clustering. By YONGLI ZHANG
Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem
More informationA Cross-Input Adaptive Framework for GPU Program Optimizations
A Cross-Input Adaptive Framework for GPU Program Optimizations Yixun Liu, Eddy Z. Zhang, Xipeng Shen Computer Science Department The College of William & Mary Outline GPU overview G-Adapt Framework Evaluation
More informationAdaptive QoS Control Beyond Embedded Systems
Adaptive QoS Control Beyond Embedded Systems Chenyang Lu! CSE 520S! Outline! Control-theoretic Framework! Service delay control on Web servers! On-line data migration in storage servers! ControlWare: adaptive
More informationManaging Web server performance with AutoTune agents
Managing Web server performance with AutoTune agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Pipat Waitayaworanart Woohyung Han Outline Introduction Apache web server and performance tuning
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationDyPO: Dynamic Pareto-Optimal Configuration Selection for Heterogeneous MpSoCs
1 DyPO: Dynamic Pareto-Optimal Configuration Selection for Heterogeneous MpSoCs UJJWAL GUPTA, Arizona State University CHETAN ARVIND PATIL, Arizona State University GANAPATI BHAT, Arizona State University
More informationSimultaneous Multithreading on Pentium 4
Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on
More informationResponse Time and Throughput
Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing
More informationA Study on Optimally Co-scheduling Jobs of Different Lengths on CMP
A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP Kai Tian Kai Tian, Yunlian Jiang and Xipeng Shen Computer Science Department, College of William and Mary, Virginia, USA 5/18/2009 Cache
More informationMap3D V58 - Multi-Processor Version
Map3D V58 - Multi-Processor Version Announcing the multi-processor version of Map3D. How fast would you like to go? 2x, 4x, 6x? - it's now up to you. In order to achieve these performance gains it is necessary
More informationCSC 2515 Introduction to Machine Learning Assignment 2
CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error
More informationEmbedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.
Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors
More informationBi-Objective Optimization for Scheduling in Heterogeneous Computing Systems
Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado
More informationGPU Sparse Graph Traversal
GPU Sparse Graph Traversal Duane Merrill (NVIDIA) Michael Garland (NVIDIA) Andrew Grimshaw (Univ. of Virginia) UNIVERSITY of VIRGINIA Breadth-first search (BFS) 1. Pick a source node 2. Rank every vertex
More informationCross-Layer Memory Management for Managed Language Applications
Cross-Layer Memory Management for Managed Language Applications Michael R. Jantz University of Tennessee mrjantz@utk.edu Forrest J. Robinson Prasad A. Kulkarni University of Kansas {fjrobinson,kulkarni}@ku.edu
More informationGaaS Workload Characterization under NUMA Architecture for Virtualized GPU
GaaS Workload Characterization under NUMA Architecture for Virtualized GPU Huixiang Chen, Meng Wang, Yang Hu, Mingcong Song, Tao Li Presented by Huixiang Chen ISPASS 2017 April 24, 2017, Santa Rosa, California
More informationA Fine-grained Performance-based Decision Model for Virtualization Application Solution
A Fine-grained Performance-based Decision Model for Virtualization Application Solution Jianhai Chen College of Computer Science Zhejiang University Hangzhou City, Zhejiang Province, China 2011/08/29 Outline
More informationQstatLab: software for statistical process control and robust engineering
QstatLab: software for statistical process control and robust engineering I.N.Vuchkov Iniversity of Chemical Technology and Metallurgy 1756 Sofia, Bulgaria qstat@dir.bg Abstract A software for quality
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationSTAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc.
STAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc. Volume 9 Issue 3 June 2005 Double the Performance: Dual-Core CPU s Make Their Debut Starting
More informationApplied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University
Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural
More informationIntel Hyper-Threading technology
Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationPowerTracer: Tracing requests in multi-tier services to diagnose energy inefficiency
: Tracing requests in multi-tier services to diagnose energy inefficiency Lin Yuan 1, Gang Lu 1, Jianfeng Zhan 1, Haining Wang 2, and Lei Wang 1 1 Institute of Computing Technology, Chinese Academy of
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More information10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)
10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,
More informationMulti-Threaded UPC Runtime for GPU to GPU communication over InfiniBand
Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Miao Luo, Hao Wang, & D. K. Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State
More informationJouleGuard: Energy Guarantees for Approximate Applications
JouleGuard: Energy Guarantees for Approximate Applications Henry Hoffmann University of Chicago, Department of Computer Science hankhoffmann@cs.uchicago.edu Abstract Energy consumption limits battery life
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More informationENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION
ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION Vignesh Adhinarayanan Ph.D. (CS) Student Synergy Lab, Virginia Tech INTRODUCTION Supercomputers are constrained by power Power
More informationFacilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-level Transparent Local Erasure Coding
Facilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-level Transparent Local Erasure Coding Yin Li, Hao Wang, Xuebin Zhang, Ning Zheng, Shafa Dahandeh,
More informationPYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads
PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),
More informationA Case Study in Optimizing GNU Radio s ATSC Flowgraph
A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationMassively Parallel Approximation Algorithms for the Knapsack Problem
Massively Parallel Approximation Algorithms for the Knapsack Problem Zhenkuang He Rochester Institute of Technology Department of Computer Science zxh3909@g.rit.edu Committee: Chair: Prof. Alan Kaminsky
More informationSemi-supervised Clustering
Semi-supervised lustering BY: $\ S - MAI AMLT - 2016/2017 (S - MAI) Semi-supervised lustering AMLT - 2016/2017 1 / 26 Outline 1 Semisupervised lustering 2 Semisupervised lustering/labeled Examples 3 Semisupervised
More informationUsing Multiple Machines to Solve Models Faster with Gurobi 6.0
Using Multiple Machines to Solve Models Faster with Gurobi 6.0 Distributed Algorithms in Gurobi 6.0 Gurobi 6.0 includes 3 distributed algorithms Distributed concurrent LP (new in 6.0) MIP Distributed MIP
More informationMeet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief
Meet the Increased Demands on Your Infrastructure with Dell and Intel ServerWatchTM Executive Brief a QuinStreet Excutive Brief. 2012 Doing more with less is the mantra that sums up much of the past decade,
More informationEnergy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package
High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationObject Detection with Partial Occlusion Based on a Deformable Parts-Based Model
Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major
More informationParallel and Distributed Optimization with Gurobi Optimizer
Parallel and Distributed Optimization with Gurobi Optimizer Our Presenter Dr. Tobias Achterberg Developer, Gurobi Optimization 2 Parallel & Distributed Optimization 3 Terminology for this presentation
More informationMbench: Benchmarking a Multicore Operating System Using Mixed Workloads
Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads Gang Lu and Xinlong Lin Institute of Computing Technology, Chinese Academy of Sciences BPOE-6, Sep 4, 2015 Backgrounds Fast evolution
More informationTPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage
TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage Performance Study of Microsoft SQL Server 2016 Dell Engineering February 2017 Table of contents
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
VIRT1052BE Extreme Performance Series: Monster VM Database Performance Todd Muirhead, VMware David Morse, VMware #VMworld #VIRT1052BE Disclaimer This presentation may contain product features that are
More informationDependency detection with Bayesian Networks
Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationAn Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic
An Oracle White Paper September 2011 Oracle Utilities Meter Data Management 2.0.1 Demonstrates Extreme Performance on Oracle Exadata/Exalogic Introduction New utilities technologies are bringing with them
More informationEntuity Network Monitoring and Analytics 10.5 Server Sizing Guide
Entuity Network Monitoring and Analytics 10.5 Server Sizing Guide Table of Contents 1 Introduction 3 2 Server Performance 3 2.1 Choosing a Server... 3 2.2 Supported Server Operating Systems for ENMA 10.5...
More informationSirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski,
More informationFEKO Mesh Optimization Study of the EDGES Antenna Panels with Side Lips using a Wire Port and an Infinite Ground Plane
FEKO Mesh Optimization Study of the EDGES Antenna Panels with Side Lips using a Wire Port and an Infinite Ground Plane Tom Mozdzen 12/08/2013 Summary This study evaluated adaptive mesh refinement in the
More informationPowernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems
Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems Thomas Ilsche, Marcus Hähnel, Robert Schöne, Mario Bielert, and Daniel Hackenberg Technische Universität Dresden Observation
More informationOutline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends
Collaborators: Richard T. Mills, Argonne National Laboratory Sarat Sreepathi, Oak Ridge National Laboratory Forrest M. Hoffman, Oak Ridge National Laboratory Jitendra Kumar, Oak Ridge National Laboratory
More informationECE 571 Advanced Microprocessor-Based Design Lecture 16
ECE 571 Advanced Microprocessor-Based Design Lecture 16 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 21 March 2013 Project Reminder Topic Selection by Tuesday (March 26) Once
More informationA Cool Scheduler for Multi-Core Systems Exploiting Program Phases
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth
More informationPOWER MANAGEMENT AND ENERGY EFFICIENCY
POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption
More informationDynamic Power Optimization for Higher Server Density Racks A Baidu Case Study with Intel Dynamic Power Technology
Dynamic Power Optimization for Higher Server Density Racks A Baidu Case Study with Intel Dynamic Power Technology Executive Summary Intel s Digital Enterprise Group partnered with Baidu.com conducted a
More informationA Computer Scientist Looks at the Energy Problem
A Computer Scientist Looks at the Energy Problem Randy H. Katz University of California, Berkeley EECS BEARS Symposium February 12, 2009 Energy permits things to exist; information, to behave purposefully.
More informationMaximizing Six-Core AMD Opteron Processor Performance with RHEL
Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1 Agenda Six-Core AMD Opteron processor
More information