LoGA: Low-Overhead GPU Accounting Using Events
|
|
- Abigayle McCormick
- 6 years ago
- Views:
Transcription
1 10th ACM International Systems and Storage Conference, (KIT) KIT The Research University in the Helmholtz Association
2 GPU Sharing GPUs increasingly popular in computing Not every application saturates a GPU time Move GPUs to the cloud Sharing increases cost-efficiency Problem: Need fairness Software scheduling is inefficient 2
3 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 3
4 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 4
5 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 5
6 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON 6
7 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON Threshold 7
8 NEON (University of Rochester, ASPLOS '14) Applies fair queuing to GPUs NEON STOP 8
9 NEON: Accounting Problem NEON s accounting disables GPU access GPU idle! STOP Freerun Phase Sampling Phase High accounting overhead if application does not saturate the GPU 9
10 Idea GPUs have lots of status registers for the device driver Leak information about GPU s internal state Reading has no effect on GPU driver or running application Idea: Poll status registers Infer GPU-internal context switches 10
11 Design App STOP Submit Query Poll 11 Scheduler Accounting thread
12 Accounting Thread Accounting thread Poll Status reg 12
13 Accounting Thread Accounting thread Poll Status reg 13
14 Accounting Thread Accounting thread Poll Status reg Poll frequency must be faster than kernel length High CPU load Poll periodically 14
15 Scheduling Scheduling thread queries accounting thread Updates fair queuing counters Stops applications if necessary Different metric than NEON NEON: Average kernel length LoGA: Total GPU time consumed 15
16 Benchmark Scenario Accounting overhead Accuracy of accounting ( scheduling quality) Competing workload: Throttle Creates well-defined GPU load run 16 sleep
17 Accounting Overhead (10% load) 1,6 10 instances of throttle, 1% load each Normalized runtime 1,5 Normalized against execution time w/o accounting 1,4 1,3 Scheduling disabled 1,2 1,1 NEON LoGA 1 0,9 0,8 17 nw cl ef pa ilte th r fin d sr er ad _v 1 sr st ad re am _v2 cl us G eo ter m ea n pa rti nn lu m d er g m pu yo cy te m um cf d dw ga t2d us s he ian ar tw a ho ll ts ho po t ts po t3 hu D ffm hy a br n id so k m rt ea n la s va le MD uk oc yt e bf s b+ tre e ba ck pr op 0,7
18 Accounting Overhead (90% load) 1,6 10 instances of throttle, 9% load each Normalized runtime 1,5 Normalized against execution time w/o accounting 1,4 1,3 Scheduling disabled 1,2 1,1 NEON LoGA 1 0,9 0,8 18 nw cl ef pa ilte th r fin d sr er ad _v 1 sr st ad re am _v2 cl us G eo ter m ea n pa rti nn lu m d er g m pu yo cy te m um cf d dw ga t2d us s he ian ar tw a ho ll ts ho po t ts po t3 hu D ffm hy a br n id so k m rt ea n la s va le MD uk oc yt e bf s b+ tre e ba ck pr op 0,7
19 Fairness Goal: LoGA should not reduce fairness Problem: Which application runtime is fair? Measure total GPU time for each application Calculate optimal scheduling Next slide: Speedup over fair schedule 19
20 Fairness: Results (4x throttle, 20% load each) 5 Speedup over fair schedule 4,5 4 3,5 3 2,5 No scheduling NEON LoGA LoGA-CL 2 1,5 1 0,5 20 km ea ns so rt hy br id an hu ffm ho ts po t ho ts po t3 D he ar tw al l ia n ga us s dw t2 d cf d b+ tre e bf s ba ck pr op 0
21 Conclusion Sharing beneficial if applications do not saturate the GPU Scheduler interference reduces sharing LoGA accounts GPU usage without overhead Poll GPU status registers, detect context switches Time between context switches as input for fair queuing 21
22 Finding Registers Envytools project has documented some registers Unfortunately, far from complete Trial and error: Run workload with known behavior Dump register values See which registers correlate with workload behavior Two registers identified: ID of currently running GPU context Activity status of entire GPU 22
GPrioSwap: Towards a Swapping Policy for GPUs
Jens Kehne, Jonathan Metter, Martin Merkel, Marius Hillenbrand, Marc Rittinghaus, Frank Bellosa 0th ACM International Systems and Storage Conference, (KIT) KIT The Research University in the Helmholtz
More informationLoGA: Low-overhead GPU accounting using events
: Low-overhead GPU accounting using events Jens Kehne Stanislav Spassov Marius Hillenbrand Marc Rittinghaus Frank Bellosa Karlsruhe Institute of Technology (KIT) Operating Systems Group os@itec.kit.edu
More informationENDF/B-VII.1 versus ENDFB/-VII.0: What s Different?
LLNL-TR-548633 ENDF/B-VII.1 versus ENDFB/-VII.0: What s Different? by Dermott E. Cullen Lawrence Livermore National Laboratory P.O. Box 808/L-198 Livermore, CA 94550 March 17, 2012 Approved for public
More informationRow 1 This is data This is data
mpdf TABLES CSS Styles The CSS properties for tables and cells is increased over that in html2fpdf. It includes recognition of THEAD, TFOOT and TH. See below for other facilities such as autosizing, and
More informationRow 1 This is data This is data. This is data out of p This is bold data p This is bold data out of p This is normal data after br H3 in a table
mpdf TABLES CSS Styles The CSS properties for tables and cells is increased over that in html2fpdf. It includes recognition of THEAD, TFOOT and TH. See below for other facilities such as autosizing, and
More informationPrices exclude Freight and GST Page 1 of 9. Range Group Image (not to scale) Part Number Description Retail
Pric exclude Freight and GST Page 1 of 9 AEMMSFA101 ECONOMIC MONITOR ARM BLACK - $119 AEMMSLA01 LCD MONITOR ARM GREY - CLAMP $249 AEMMSLA02 LCD MONITOR ARM GREY - CLAMP $249 r Acc ce ss so ori e om mp
More informationArachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout
Arachne Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout Granular Computing Platform Zaharia Winstein Levis Applications Kozyrakis Cluster Scheduling Ousterhout Low-Latency RPC
More informationSPAREPARTSCATALOG: CONNECTORS SPARE CONNECTORS KTM ART.-NR.: 3CM EN
SPAREPARTSCATALOG: CONNECTORS ART.-NR.: 3CM3208201EN CONTENT SPARE CONNECTORS AA-AN SPARE CONNECTORS AO-BC SPARE CONNECTORS BD-BQ SPARE CONNECTORS BR-CD 3 4 5 6 SPARE CONNECTORS CE-CR SPARE CONNECTORS
More informationData Placement Optimization in GPU Memory Hierarchy Using Predictive Modeling
Data Placement Optimization in GPU Memory Hierarchy Using Predictive Modeling Larisa Stoltzfus*, Murali Emani, Pei-Hung Lin, Chunhua Liao *University of Edinburgh (UK), Lawrence Livermore National Laboratory
More informationImproving Cloud Application Performance with Simulation-Guided CPU State Management
Improving Cloud Application Performance with Simulation-Guided CPU State Management Mathias Gottschlag, Frank Bellosa April 23, 2017 KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) - OPERATING SYSTEMS GROUP KIT
More informationSPARE CONNECTORS KTM 2014
SPAREPARTSCATALOG: // ENGINE ART.-NR.: 3208201EN CONTENT CONNECTORS FOR WIRING HARNESS AA-AN CONNECTORS FOR WIRING HARNESS AO-BC CONNECTORS FOR WIRING HARNESS BD-BQ CONNECTORS FOR WIRING HARNESS BR-CD
More informationPredicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters
Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Junaid Nomani and Jakub Szefer Computer Architecture and Security Laboratory Yale University junaid.nomani@yale.edu
More informationTraining manual: An introduction to North Time Pro 2019 ESS at the terminal
Training manual: An introduction to North Time Pro 2019 ESS at the terminal Document t2-0800 Revision 15.1 Copyright North Time Pro www.ntdltd.com +44 (0) 2892 604000 For more information about North
More informationMace, J. et al., "2DFQ: Two-Dimensional Fair Queuing for Multi-Tenant Cloud Services," Proc. of ACM SIGCOMM '16, 46(4): , Aug
Mace, J. et al., "2DFQ: Two-Dimensional Fair Queuing for Multi-Tenant Cloud Services," Proc. of ACM SIGCOMM '16, 46(4):144-159, Aug. 2016. Introduction Server has limited capacity Requests by clients are
More informationExploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems
Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI
More informationA method to vary the Host interface signaling speeds in a Storage Array driving towards Greener Storage.
A method to vary the Host interface signaling speeds in a Storage Array driving towards Greener Storage. Dr. M. K. Jibbe, Technical Director NetApp, Wichita, Ks USA Mahmoud.jibbe@netapp.com 316 636 8810
More informationDatacenter application interference
1 Datacenter application interference CMPs (popular in datacenters) offer increased throughput and reduced power consumption They also increase resource sharing between applications, which can result in
More informationICP-OES. By: Dr. Sarhan A. Salman
ICP-OES By: Dr. Sarhan A. Salman AGENDA -- ICP-OES Overview - Module removal Optical components and gas control overview Module replacement --Continue module replacement Manual optical alignment --Precise
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationAdaptive Power Profiling for Many-Core HPC Architectures
Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory State-of-the-Art Schedulers
More informationContention-Aware Scheduling of Parallel Code for Heterogeneous Systems
Contention-Aware Scheduling of Parallel Code for Heterogeneous Systems Chris Gregg Jeff S. Brantley Kim Hazelwood Department of Computer Science, University of Virginia Abstract A typical consumer desktop
More informationActivity concentration for exempt material (Bq/g)
173.436 Exempt material activity concentrations and exempt consignment activity limits for radionuclides. The Table of Exempt material activity concentrations and exempt consignment activity limits for
More informationUNLOCKING BANDWIDTH FOR GPUS IN CC-NUMA SYSTEMS
UNLOCKING BANDWIDTH FOR GPUS IN CC-NUMA SYSTEMS Neha Agarwal* David Nellans Mike O Connor Stephen W. Keckler Thomas F. Wenisch* NVIDIA University of Michigan* (Major part of this work was done when Neha
More informationINTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS
INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS ARKAPRAVA BASU, JOSEPH L. GREATHOUSE, GURU VENKATARAMANI, JÁN VESELÝ AMD RESEARCH, ADVANCED MICRO DEVICES, INC. MODERN SYSTEMS ARE POWERED BY HETEROGENEITY
More informationSpeculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois
Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications José éf. Martínez * and Josep Torrellas University of Illinois ASPLOS 2002 * Now at Cornell University Overview Allow
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More informationA Scalable Event Dispatching Library for Linux Network Servers
A Scalable Event Dispatching Library for Linux Network Servers Hao-Ran Liu and Tien-Fu Chen Dept. of CSIE National Chung Cheng University Traditional server: Multiple Process (MP) server A dedicated process
More informationACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS
ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS Prabodha Srimal Rodrigo Registration No. : 138230V Degree of Master of Science Department of Computer Science & Engineering University
More informationCider. Native Execution of ios Apps on Android. Alexander Van't Hof, Naser AlDuaij, Christoffer Dall, Nicolas Viennot, Jason Nieh. Columbia University
Cider Native Execution of Apps on Jeremy Andrus (jeremya@cs.columbia.edu) Alexander Van't Hof, Naser AlDuaij, Christoffer Dall, Nicolas Viennot, Jason Nieh Columbia University In the City of New York March
More informationBatch Jobs Performance Testing
Batch Jobs Performance Testing October 20, 2012 Author Rajesh Kurapati Introduction Batch Job A batch job is a scheduled program that runs without user intervention. Corporations use batch jobs to automate
More informationProdDiagNode - Version: 1. Production Diagnostics for Node Applications
ProdDiagNode - Version: 1 Production Diagnostics for Node Applications Production Diagnostics for Node Applications ProdDiagNode - Version: 1 2 days Course Description: Node.js, the popular cross-platform
More informationCSharp. Microsoft. PRO-Design & Develop Enterprise Appl by Using MS.NET Frmwk
Microsoft 70-549-CSharp PRO-Design & Develop Enterprise Appl by Using MS.NET Frmwk Download Full Version : http://killexams.com/pass4sure/exam-detail/70-549-csharp QUESTION: 170 You are an enterprise application
More informationCritically Missing Pieces on Accelerators: A Performance Tools Perspective
Critically Missing Pieces on Accelerators: A Performance Tools Perspective, Karthik Murthy, Mike Fagan, and John Mellor-Crummey Rice University SC 2013 Denver, CO November 20, 2013 What Is Missing in GPUs?
More informationPerformance and Scalability of Server Consolidation
Performance and Scalability of Server Consolidation August 2010 Andrew Theurer IBM Linux Technology Center Agenda How are we measuring server consolidation? SPECvirt_sc2010 How is KVM doing in an enterprise
More informationGaaS Workload Characterization under NUMA Architecture for Virtualized GPU
GaaS Workload Characterization under NUMA Architecture for Virtualized GPU Huixiang Chen, Meng Wang, Yang Hu, Mingcong Song, Tao Li Presented by Huixiang Chen ISPASS 2017 April 24, 2017, Santa Rosa, California
More informationHeterogeneous-Race-Free Memory Models
Heterogeneous-Race-Free Memory Models Jyh-Jing (JJ) Hwang, Yiren (Max) Lu 02/28/2017 1 Outline 1. Background 2. HRF-direct 3. HRF-indirect 4. Experiments 2 Data Race Condition op1 op2 write read 3 Sequential
More informationOn the Accuracy and Scalability of Intensive I/O Workload Replay
On the Accuracy and Scalability of Intensive I/O Workload Replay Alireza Haghdoost, Weiping He, Jerry Fredin *, David H.C. Du University of Minnesota, * NetApp Inc. Center for Research in Intelligent Storage
More informationKEYNOTE FOR
THE OS SCHEDULER: A PERFORMANCE-CRITICAL COMPONENT IN LINUX CLUSTER ENVIRONMENTS KEYNOTE FOR BPOE-9 @ASPLOS2018 THE NINTH WORKSHOP ON BIG DATA BENCHMARKS, PERFORMANCE, OPTIMIZATION AND EMERGING HARDWARE
More informationM a p s 2. 0 T h e c o o p era tive w ay t o ke e p m a p s u p t o d a t e. O l iver. Kü h n
M a p s 2. 0 T h e c o o p era tive w ay t o ke e p m a p s u p t o d a t e O l iver Kü h n s k obble r G m b H M a i 2 0 1 3 This is a stor y about how we all get unconscious ly involved in creating digital
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationDIDO: Dynamic Pipelines for In-Memory Key-Value Stores on Coupled CPU-GPU Architectures
DIDO: Dynamic Pipelines for In-Memory Key-Value Stores on Coupled CPU-GPU rchitectures Kai Zhang, Jiayu Hu, Bingsheng He, Bei Hua School of Computing, National University of Singapore School of Computer
More informationBolt: I Know What You Did Last Summer In the Cloud
Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou 1 and Christos Kozyrakis 2 1 Cornell University, 2 Stanford University ASPLOS April 12 th 2017 Executive Summary Problem: cloud resource
More informationRegister and Thread Structure Optimization for GPUs Yun (Eric) Liang, Zheng Cui, Kyle Rupnow, Deming Chen
Register and Thread Structure Optimization for GPUs Yun (Eric) Liang, Zheng Cui, Kyle Rupnow, Deming Chen Peking University, China Advanced Digital Science Center, Singapore University of Illinois at Urbana
More informationSIMD Divergence Optimization through Intra-Warp Compaction. Aniruddha Vaidya Anahita Shayesteh Dong Hyuk Woo Roy Saharoy Mani Azimi ISCA 13
SIMD Divergence Optimization through Intra-Warp Compaction Aniruddha Vaidya Anahita Shayesteh Dong Hyuk Woo Roy Saharoy Mani Azimi ISCA 13 Problem GPU: wide SIMD lanes 16 lanes per warp in this work SIMD
More informationA Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarangnirun,
More informationBigDataBench-MT: Multi-tenancy version of BigDataBench
BigDataBench-MT: Multi-tenancy version of BigDataBench Gang Lu Beijing Academy of Frontier Science and Technology BigDataBench Tutorial, ASPLOS 2016 Atlanta, GA, USA n Software perspective Multi-tenancy
More informationLionbridge ondemand for Adobe Experience Manager
Lionbridge ondemand for Adobe Experience Manager Version 1.1.0 Configuration Guide October 24, 2017 Copyright Copyright 2017 Lionbridge Technologies, Inc. All rights reserved. Published in the USA. March,
More informationBasics of Performance Engineering
ERLANGEN REGIONAL COMPUTING CENTER Basics of Performance Engineering J. Treibig HiPerCH 3, 23./24.03.2015 Why hardware should not be exposed Such an approach is not portable Hardware issues frequently
More informationBFQ, fairness and low latency in block I/O The Two Towers. Paolo Valente
BFQ, fairness and low latency in block I/O The Two Towers Paolo Valente Contents Why BFQ? What happened since the last episode? What is still to come? The two Towers How is it going in general with latency
More informationRS-232C Interface Command Table
RS-232C Interface Command Table EN 73 Lower OrderN Higher OrderQ 0 1 2 3 4 5 6 7 8 9 A B C D E F 0 Complete Error Cassette Out Not Target ACK NAK 1 2 3 VCR Selection DVD Selection Play Stop 4 Still 5 Clear
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationEnterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015
Enterprise Breadth-First Graph Traversal on GPUs Hang Liu H. Howie Huang November 9th, 5 Graph is Ubiquitous Breadth-First Search (BFS) is Important Wide Range of Applications Single Source Shortest Path
More informationWarped-Preexecution: A GPU Pre-execution Approach for Improving Latency Hiding
Warped-Preexecution: A GPU Pre-execution Approach for Improving Latency Hiding Keunsoo Kim, Sangpil Lee, Myung Kuk Yoon, *Gunjae Koo, Won Woo Ro, *Murali Annavaram Yonsei University *University of Southern
More informationInstruction Sets: Characteristics and Functions Addressing Modes
Instruction Sets: Characteristics and Functions Addressing Modes Chapters 10 and 11, William Stallings Computer Organization and Architecture 7 th Edition What is an Instruction Set? The complete collection
More informationMultitasking and scheduling
Multitasking and scheduling Guillaume Salagnac Insa-Lyon IST Semester Fall 2017 2/39 Previously on IST-OPS: kernel vs userland pplication 1 pplication 2 VM1 VM2 OS Kernel rchitecture Hardware Each program
More informationDesigning High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches. Onur Mutlu March 23, 2010 GSRC
Designing High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches Onur Mutlu onur@cmu.edu March 23, 2010 GSRC Modern Memory Systems (Multi-Core) 2 The Memory System The memory system
More informationSTEPS Towards Cache-Resident Transaction Processing
STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I
More informationFLIGHTS TO / FROM CANADA ARE DOMESTIC
MINIMUM CONNECTING TIME Houston, USA FLIGHTS TO / FROM CANADA ARE DOMESTIC IAH (Intercontinental Airport) DOMESTIC TO DOMESTIC :45 AA TO DL :40 CO TO AC :30 AA, UA, US :20 CO :30 DL, WN :25 DOMESTIC TO
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationAdaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems
Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Ben Taylor, Vicent Sanz Marco, Zheng Wang School of Computing and Communications, Lancaster University, UK {b.d.taylor, v.sanzmarco,
More informationVulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics
Vulkan: Scaling to Multiple Threads Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies Take responsibility
More informationTowards Fair and Efficient SMP Virtual Machine Scheduling
Towards Fair and Efficient SMP Virtual Machine Scheduling Jia Rao and Xiaobo Zhou University of Colorado, Colorado Springs http://cs.uccs.edu/~jrao/ Executive Summary Problem: unfairness and inefficiency
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationPYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads
PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),
More informationParallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010
Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:
More informationBenchmark Study: A Performance Comparison Between RHEL 5 and RHEL 6 on System z
Benchmark Study: A Performance Comparison Between RHEL 5 and RHEL 6 on System z 1 Lab Environment Setup Hardware and z/vm Environment z10 EC, 4 IFLs used (non concurrent tests) Separate LPARs for RHEL
More informationQoS-Aware Admission Control in Heterogeneous Datacenters
QoS-Aware Admission Control in Heterogeneous Datacenters Christina Delimitrou, Nick Bambos and Christos Kozyrakis Stanford University ICAC June 28 th 2013 Cloud DC Scheduling Workloads DC Scheduler S S
More informationPosition Paper: OpenMP scheduling on ARM big.little architecture
Position Paper: OpenMP scheduling on ARM big.little architecture Anastasiia Butko, Louisa Bessad, David Novo, Florent Bruguier, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, and Michel Robert LIRMM
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationLocality-Aware Software Throttling for Sparse Matrix Operation on GPUs
Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs Yanhao Chen 1, Ari B. Hayes 1, Chi Zhang 2, Timothy Salmon 1, Eddy Z. Zhang 1 1. Rutgers University 2. University of Pittsburgh Sparse
More informationPROCESS SCHEDULING II. CS124 Operating Systems Fall , Lecture 13
PROCESS SCHEDULING II CS124 Operating Systems Fall 2017-2018, Lecture 13 2 Real-Time Systems Increasingly common to have systems with real-time scheduling requirements Real-time systems are driven by specific
More informationCl. Cl. ..., V, V, -I..., - QJ d.
)> Cl. Cl...., m V, V, -I..., QJ :J V, - QJ d. 0 :J Main Points Address Translation Concept - How do we convert a virtual address to a physical address? Flexible Address Translation I R. lz G.z: f1..t:?z:.mo/2_'-(
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationAmir Zipory Senior Solutions Architect, Redhat Israel, Greece & Cyprus
Amir Zipory Senior Solutions Architect, Redhat Israel, Greece & Cyprus amirz@redhat.com TODAY'S IT CHALLENGES IT is under tremendous pressure from the organization to enable growth Need to accelerate,
More informationMulti-Level Feedback Queues
CS 326: Operating Systems Multi-Level Feedback Queues Lecture 8 Today s Schedule Building an Ideal Scheduler Priority-Based Scheduling Multi-Level Queues Multi-Level Feedback Queues Scheduling Domains
More informationTMCH Report March February 2017
TMCH Report March 2013 - February 2017 Contents Contents 2 1 Trademark Clearinghouse global reporting 3 1.1 Number of jurisdictions for which a trademark record has been submitted for 3 2 Trademark Clearinghouse
More informationProactive System Check Supported Product List as of 14 April 2017
Proactive System Check Supported Product List as of 14 April 2017 Expansions - no PSC price as attached within base unit price Tape drives - no PSC price as attached within base unit price Racks - no PSC
More informationStrong Scaling for Molecular Dynamics Applications
Strong Scaling for Molecular Dynamics Applications Scaling and Molecular Dynamics Strong Scaling How does solution time vary with the number of processors for a fixed problem size Classical Molecular Dynamics
More informationDatabase Acceleration Solution Using FPGAs and Integrated Flash Storage
Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access
More informationSCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH
Faculty of Computer Science Institute of Systems Architecture, Operating Systems Group SCALABILITY AND HETEROGENEITY MICHAEL ROITZSCH LAYER CAKE Application Runtime OS Kernel ISA Physical RAM 2 COMMODITY
More informationCD _. _. 'p ~~M CD, CD~~~~V. C ~'* Co ~~~~~~~~~~~~- CD / X. pd.0 & CD. On 0 CDC _ C _- CD C P O ttic 2 _. OCt CD CD (IQ. q"3. 3 > n)1t.
n 5 L n q"3 +, / X g ( E 4 11 " ') $ n 4 ) w Z$ > _ X ~'* ) i 1 _ /3 L 2 _ L 4 : 5 n W 9 U~~~~~~ 5 T f V ~~~~~~~~~~~~ (Q ' ~~M 3 > n)1 % ~~~~V v,~~ _ + d V)m X LA) z~~11 4 _ N cc ', f 'd 4 5 L L " V +,
More informationParallel Processing for Data Deduplication
Parallel Processing for Data Deduplication Peter Sobe, Denny Pazak, Martin Stiehr Faculty of Computer Science and Mathematics Dresden University of Applied Sciences Dresden, Germany Corresponding Author
More informationPerformance Baseline of Oracle Exadata X2-2 HR HC
Performance Baseline of Oracle Exadata X2-2 HR HC Part I: CPU Performance Benchware Performance Suite Release 8.4 (Build 130630) June 2013 Contents 1 Introduction to CPU Performance Tests 2 CPU and Server
More informationCS 326: Operating Systems. CPU Scheduling. Lecture 6
CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating
More informationProactive System Check Supported Product List as of March 27, 2018
Proactive System Check Supported Product List as of March 27, 2018 Expansions - no PSC price as attached within base unit price Tape drives - no PSC price as attached within base unit price Racks - no
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationMay 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to
A Sleep-based Our Akram Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) May 1, 2011 Our 1 2 Our 3 4 5 6 Our Efficiency in Back-end Processing Efficiency in back-end
More informationOPAL-RT FIU User Guide.
OPAL-RT FIU User Guide www.opal-rt.com Published by Opal-RT Technologies, Inc. 1751 Richardson, suite 2525 Montreal, Quebec, Canada H3K 1G6 www.opal-rt.com 2013 Opal-RT Technologies, Inc. All rights reserved
More informationECE 571 Advanced Microprocessor-Based Design Lecture 7
ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2017 Announcements HW#4 will be posted, some readings 1 Measuring
More informationEvaluating the Potential of Graphics Processors for High Performance Embedded Computing
Evaluating the Potential of Graphics Processors for High Performance Embedded Computing Shuai Mu, Chenxi Wang, Ming Liu, Yangdong Deng Department of Micro-/Nano-electronics Tsinghua University Outline
More informationEvaluating the Error Resilience of Parallel Programs
Evaluating the Error Resilience of Parallel Programs Bo Fang, Karthik Pattabiraman, Matei Ripeanu, The University of British Columbia Sudhanva Gurumurthi AMD Research 1 Reliability trends The soft error
More informationPerformance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of
More informationScorciatoie da tastiera - Circad 4.09
Scorciatoie da tastiera - Circad 4.09 File FE New File - File FO Open File - File FR Re-Open - File FT Total Recall - File FD Load - File FC Close File - File FE Erase File - File FS Save File - File FA
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationSynchronization. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Synchronization Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More informationGLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs
GLocks: Efficient Support for Highly- Contended Locks in Many-Core CMPs Authors: Jos e L. Abell an, Juan Fern andez and Manuel E. Acacio Presenter: Guoliang Liu Outline Introduction Motivation Background
More informationMonitoring Simplified.
Monitoring Simplified http://nsclient.org How many use NSClient++ NS-what did he say??#@*&%! wrong room! How many like NSClient++?..pdh collection thread not running ERROR: Missing argument exception PdhCollectQueryData?
More information