Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache
|
|
- Thomas Thompson
- 5 years ago
- Views:
Transcription
1 Leveraging Hybrid Hardware in New Ways: The GPU Paging Cache Frank Feinbube, Peter Tröger, Johannes Henning, Andreas Polze Hasso Plattner Institute Operating Systems and Middleware Prof. Dr. Andreas Polze
2 Everyone should benefit from all available Compute Devices
3 Call for Action Operating Systems must support GPU abstractions Rossbach, Currey, Witchel [HotOS11]
4 Operating System Perspective Operating Systems must support GPU abstractions Rossbach, Currey, Witchel [HotOS11] Accelerators are still a black box No appropriate abstractions ioctl based interface: Unknown architecture, Unknown capabilities No system-wide guarantees Coarse grained scheduling No isolation guarantees, No protection guarantees
5 Related Work: GPU Computing to speed up Operating System Services Operating Systems must support GPU abstractions Rossbach, Currey, Witchel [HotOS11] PTask API by Rossbach et al. (based on CUDA) Workload as acyclic graphs of ptasks executed on GPU Proof-of-concept: encryption / decryption for Linux EncFS GPUStore by Sun et al. (based on CUDA) Specially tailored for storage systems GPUs for encryption and RAID functionality Gdev framework by Kato et al. (based on Direct Rendering Int.) Reverse engineering of graphics card driver internals The GPU Paging Cache ICPADS 2013 Frank Feinbube
6
7
8 What about the memory? The GPU Paging Cache ICPADS 2013 Frank Feinbube
9 Hardware Survey of Memory Sizes System RAM Distribution (Steam HW Survey May 2012) VRAM Distribution (Steam HW Survey May 2012) 5 or more GB; 30,76% less than 2 GB; 6,71% more than 1024 MB; 8,93% 256 MB; 10,68% 512 MB; 19,44% 2 GB; 16,77% Other; 19,68% 4 GB; 23,64% 3 GB; 22,65% 1024 MB; 41,27% 70% less than 5 GB 50% at least 1 GB
10 NVIDIA Quadro K GB of Memory!!
11 The Idea The GPU Paging Cache ICPADS 2013 Frank Feinbube
12 GPU Paging Cache The vision: GPU memory is used as a fast cache for paging information written to secondary storage.
13 Is Paging still a problem? The GPU Paging Cache ICPADS 2013 Frank Feinbube
14 Page Faults still occur Software is a gas. Nathan Myhrvold, the former CTO of Microsoft Software always expands to fit whatever container it is stored in.
15 First Approach: Windows Research Kernel
16 Windows Research Kernel (WRK) Stripped down Windows Server 2003 sources Only kernel itself, no drivers, GUI, user-mode components Missing components: HAL, power management, plug-and-play Released in 2006 Freely available to academic institutions Encouraged by license: Modification Publication (of excerpts)
17 Second Approach: File System Driver
18 Application Application Application Application User Mode Pagefile IO Manager GPU Paging Cache Driver? Kernel Mode Disk RAM GPU Memory Hardware
19 /IO Request initialize driver wait for IO request IO Request/ GPU Paging Cache Driver read request analyze IO Request write request yes GPU active? no GPU active? yes no in GPU cache? no fetch from disk write to GPU cache write to disk yes fetch from GPU cache async write to disk copy result to IO Request The GPU Paging Cache ICPADS 2013 Frank Feinbube
20 Application Application Application Application Upcall Usermode Component OpenCL User Mode Pagefile IO Manager GPU Paging Cache Driver GPU Driver Kernel Mode Disk RAM GPU Memory Hardware
21 initialize events (InvokerEvent, DoneEvent) User mode component initialize shared memory (exchangearea) propagate events and shared memory to driver /DoneEvent wait for request InvokerEvent/ read request read request from exchangearea write request enqueue OpenCL read request enqueue OpenCL write request write result to exchangearea The GPU Paging Cache ICPADS 2013 Frank Feinbube
22 Accessing GPU Memory with OpenCL The GPU Paging Cache ICPADS 2013 Frank Feinbube
23 Application Application Application Application One time Initialization Usermode Component DirectDraw User Mode Pagefile IO Manager GPU Paging Cache Driver GPU Driver Kernel Mode Disk RAM GPU Memory Hardware
24 Accessing GPU Memory with DirectDraw Surface Pointers The GPU Paging Cache ICPADS 2013 Frank Feinbube
25 What are the right Performance Metrics?
26 Distribution of page read block sizes The GPU Paging Cache ICPADS 2013 Frank Feinbube
27 Occurances The GPU Paging Cache ICPADS 2013 Frank Feinbube Paging Request Write Other 5% Write requests to pagefile 1 MB 95% Size in bytes
28 Distribution of page write block sizes The GPU Paging Cache ICPADS 2013 Frank Feinbube
29 Performance Metric The GPU Paging Cache ICPADS 2013 Frank Feinbube
30 Evaluation: What s in it for me?
31 Tested Hardware ID CPU GPU Memory Disk Operating System 2600k 460gtx E gtx 2620m 540mgt Intel Core i7-2600k 3,7 Ghz Intel Core 2 Duo E8500 3,17 Ghz Intel Core i7-2620m 2,7 Ghz Nvidia GeForce GTX MB GDDR5 Nvidia GeForce GTX MB GDDR3 Nvidia GeForce GT 540M 3 GB GDDR3 8 GB DDR3 8 GB DDR2 8 GB DDR3 Intel Rapid Storage 64 GB SSD 1 TB HDD 240 GB SSD Windows 7 Ultimate SP1 64 bit Windows 7 Ultimate SP1 64 bit Windows 7 Ultimate SP1 64 bit
32 weighted performance Evaluation 10000,00 MB/s 1000,00 MB/s 100,00 MB/s 10,00 MB/s 1,00 MB/s Disk User Mode GPU-based Paging Cache Kernel Mode GPUbased Paging Cache System Memory 2600k 460gtx 18,33 MB/s 112,78 MB/s 106,58 MB/s 1061,18 MB/s 2620m 540mgt 24,96 MB/s 22,01 MB/s 105,97 MB/s 986,77 MB/s E gtx 5,37 MB/s 57,69 MB/s 207,14 MB/s 629,28 MB/s The GPU Paging Cache ICPADS 2013 Frank Feinbube
33 Overview of the measurements for the first test system The GPU Paging Cache ICPADS 2013 Frank Feinbube
34 Disk throughput of the first test system The GPU Paging Cache ICPADS 2013 Frank Feinbube
35 throughput 10000,00 MB/s The GPU Paging Cache ICPADS 2013 Frank Feinbube Performance Evaluation 1000,00 MB/s 100,00 MB/s SSD 10,00 MB/s 1,00 MB/s Disk User Mode GPU-based Paging Cache Kernel Mode GPUbased Paging Cache System Memory E gtx 1,87 MB/s 51,29 MB/s 92,05 MB/s 777,76 MB/s E gtx 9,61 MB/s 62,98 MB/s 233,30 MB/s 726,51 MB/s 2620m 540mgt 28,79 MB/s 24,18 MB/s 126,94 MB/s 1108,99 MB/s 2600k 460gtx 34,70 MB/s 118,23 MB/s 130,05 MB/s 1241,82 MB/s
36 Challenges / Open Research Questions We need stable standardized kernel driver interfaces to the GPU Compute Devices and their memory! Optimal page transfer sizes / Optimal placement Bandwidth limitations and communication overhead When to swap pages to disk, or to GPU? Which user types benefit the most? Hardware architecture Integrated GPU vs. dedicated GPU vs. Hybrid Multi-GPU Dynamic GPU properties
37 Conclusion Accelerators New system architectures Operating systems lack appropriate abstractions New concepts to manage/handle accelerators Various approaches (novel OS design vs. legacy integration) GPU Page Cache Leverage GPU Global Memory for caching non-resident pages Reduce paging operations latencies
38 The GPU Paging Cache Operating Systems must support GPU abstractions GPU as yet another processor Resource abstraction, scheduling, security Transparent data transfer and execution on all types of accelerators GPU Memory as a Paging Cache. Everyone should benefit from GPU Computing!
Chris Rossbach, Jon Currey, Microsoft Research Mark Silberstein, Technion Baishakhi Ray, Emmett Witchel, UT Austin SOSP October 25, 2011
Chris Rossbach, Jon Currey, Microsoft Research Mark Silberstein, Technion Baishakhi Ray, Emmett Witchel, UT Austin SOSP October 25, 2011 There are lots of GPUs 3 of top 5 supercomputers use GPUs In all
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationWaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST
WaveView System Requirement V6 Reference: WST-0125-01 www.wavestore.com Page 1 WaveView System Requirements V6 Copyright notice While every care has been taken to ensure the information contained within
More informationChris Rossbach, Microsoft Research Emmett Witchel, University of Texas at Austin September
Chris Rossbach, Microsoft Research Emmett Witchel, University of Texas at Austin September 23 2010 Broaden GPU application domains Cheaper/simpler development cycles Bigger deployed base Better utilization
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationThe Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration
The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17415 Reference Architecture Dell EMC Solutions Copyright
More informationFujitsu VDI / vgpu Virtualization
Fujitsu VDI / vgpu Virtualization Antti Sirkiä Service Partner Manager, Certified Trainer Fujitsu, Product Business Unit Why Virtualization / Graphics Virtualization? :: GRAPHICS VIRTUALIZATION :: Multiple
More informationProjects on the Intel Single-chip Cloud Computer (SCC)
Projects on the Intel Single-chip Cloud Computer (SCC) Jan-Arne Sobania Dr. Peter Tröger Prof. Dr. Andreas Polze Operating Systems and Middleware Group Hasso Plattner Institute for Software Systems Engineering
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationAES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley
AES Cryptosystem Acceleration Using Graphics Processing Units Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley Overview Introduction Compute Unified Device Architecture (CUDA) Advanced
More informationAccelerating Data Centers Using NVMe and CUDA
Accelerating Data Centers Using NVMe and CUDA Stephen Bates, PhD Technical Director, CSTO, PMC-Sierra Santa Clara, CA 1 Project Donard @ PMC-Sierra Donard is a PMC CTO project that leverages NVM Express
More informationLevel 2 Diploma Unit 3 Computer Systems
Level 2 Diploma Unit 3 Computer Systems You are an IT technician in a small company which creates web sites. The company has recently employed someone who is partially sighted and is also left handed.
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationParallel Programming Concepts. GPU Computing with OpenCL
Parallel Programming Concepts GPU Computing with OpenCL Frank Feinbube Operating Systems and Middleware Prof. Dr. Andreas Polze Agenda / Quicklinks 2 Recapitulation Motivation History of GPU Computing
More informationExotic Methods in Parallel Computing [GPU Computing]
Exotic Methods in Parallel Computing [GPU Computing] Frank Feinbube Exotic Methods in Parallel Computing Dr. Peter Tröger Exotic Methods in Parallel Computing FF 2012 Architectural Shift 2 Exotic Methods
More informationAdaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis
Adaptec MaxIQ SSD Cache Performance Solution for Web Server Environments Analysis September 22, 2009 Page 1 of 7 Introduction Adaptec has requested an evaluation of the performance of the Adaptec MaxIQ
More informationDDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1
1 DDN DDN Updates DataDirect Neworks Japan, Inc Nobu Hashizume DDN Storage 2018 DDN Storage 1 2 DDN A Broad Range of Technologies to Best Address Your Needs Your Use Cases Research Big Data Enterprise
More informationEven coarse architectural trends impact tremendously the design of systems
CSE 451: Operating Systems Spring 2006 Module 2 Architectural Support for Operating Systems John Zahorjan zahorjan@cs.washington.edu 534 Allen Center Even coarse architectural trends impact tremendously
More informationParallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU
Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationNLVMUG 16 maart Display protocols in Horizon
NLVMUG 16 maart 2017 Display protocols in Horizon NLVMUG 16 maart 2017 Display protocols in Horizon Topics Introduction Display protocols - Basics PCoIP vs Blast Extreme Optimizing Monitoring Future Recap
More informationOperating Systems & Middleware Group
Operating Systems & Middleware Group Research activities Prof. Dr. Andreas Polze OSM Group: Teaching 2 Operating system foundations and principles Operating systems I + II (Bachelor) Server operating systems
More informationVirtualization Station. Brings an Efficient Virtualization Environment 4 essential aspects
Virtualization Station Brings an Efficient Virtualization Environment 4 essential aspects Core values of Virtualization Logically dividing the physical computer resource (CPU, memory, storage and network)
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort
PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationQuickSpecs. HP Z 3D Camera. HP Z 3D Camera. Overview. 1. Main Module 2. Mount 3. Scan Mat
1. Main Module 2. Mount 3. Scan Mat Page 1 1. Main Module 2. Mount 3. Scan Mat Side View dimensions Page 2 1. Main Module 2. Mount 3. Scan Mat Front view dimensions Page 3 1. Main Module 2. Mount 3. Scan
More informationThe following document describes the features that a PC has to have in order to correctly work with Open Technologies' dental scanners.
The following document describes the features that a PC has to have in order to correctly work with Open Technologies' dental scanners. Open Technologies suggests the purchase of Dell XPS 8920 Desktop,
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationIBM Db2 Analytics Accelerator Version 7.1
IBM Db2 Analytics Accelerator Version 7.1 Delivering new flexible, integrated deployment options Overview Ute Baumbach (bmb@de.ibm.com) 1 IBM Z Analytics Keep your data in place a different approach to
More informationLaptop Requirement: Technical Specifications and Guidelines. Frequently Asked Questions
Laptop Requirement: Technical Specifications and Guidelines As artists and designers, you will be working in an increasingly digital landscape. The Parsons curriculum addresses this by making digital literacy
More informationHow to Speed up Database Applications with a Purpose-Built SSD Storage Solution
How to Speed up Database Applications with a Purpose-Built SSD Storage Solution SAN Accessible Storage Array Speeds Applications by up to 25x Introduction Whether deployed in manufacturing, finance, web
More informationA Comparison Study of Intel SGX and AMD Memory Encryption Technology
A Comparison Study of Intel SGX and AMD Memory Encryption Technology Saeid Mofrad, Fengwei Zhang Shiyong Lu Wayne State University {saeid.mofrad, Fengwei, Shiyong}@wayne.edu Weidong Shi (Larry) University
More informationKernel level AES Acceleration using GPUs
Kernel level AES Acceleration using GPUs TABLE OF CONTENTS 1 PROBLEM DEFINITION 1 2 MOTIVATIONS.................................................1 3 OBJECTIVE.....................................................2
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationOverview of Project's Achievements
PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Building systems with GPUs is hard. Why? 2 Goal of
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU
More informationTransparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching
Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National
More informationUltimate Workstation Performance
Product brief & COMPARISON GUIDE Intel Scalable Processors Intel W Processors Ultimate Workstation Performance Intel Scalable Processors and Intel W Processors for Professional Workstations Optimized to
More informationNew HPE 3PAR StoreServ 8000 and series Optimized for Flash
New HPE 3PAR StoreServ 8000 and 20000 series Optimized for Flash AGENDA HPE 3PAR StoreServ architecture fundamentals HPE 3PAR Flash optimizations HPE 3PAR portfolio overview HPE 3PAR Flash example from
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationPTASK + DANDELION: DATA-FLOW PROGRAMMING SUPPORT FOR HETEROGENEOUS PLATFORMS
Chris Rossbach and Jon Currey Microsoft Research Silicon Valley NVIDIA GTC 5/17/2012 PTASK + DANDELION: DATA-FLOW PROGRAMMING SUPPORT FOR HETEROGENEOUS PLATFORMS Motivation/Overview GPU programming is
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationIllinois Proposal Considerations Greg Bauer
- 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and
More informationNEC Express5800/ft series
Fault Tolerant Server ft series The ultimate choice for business continuity NEC Express5800 fault tolerant servers Fully redundant components are highly resistant to failures. High-availability servers
More informationPowerVault MD3 SSD Cache Overview
PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationAxxonSoft. The Axxon Smart. Software Package. Recommended platforms. Version 1.0.4
AxxonSoft The Axxon Smart Software Package Recommended platforms Version 1.0.4 Moscow 2010 1 Contents 1 Recommended hardware platforms for Server and Client... 3 2 Size of disk subsystem... 4 3 Supported
More informationThe vsphere 6.0 Advantages Over Hyper- V
The Advantages Over Hyper- V The most trusted and complete virtualization platform SDDC Competitive Marketing 2015 Q2 VMware.com/go/PartnerCompete 2015 VMware Inc. All rights reserved. v3b The Most Trusted
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationMusemage. The Revolution of Image Processing
Musemage The Revolution of Image Processing Kaiyong Zhao Hong Kong Baptist University, Paraken Technology Co. Ltd. Yubo Zhang University of California Davis Outline Introduction of Musemage Why GPU based
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationCloudian Sizing and Architecture Guidelines
Cloudian Sizing and Architecture Guidelines The purpose of this document is to detail the key design parameters that should be considered when designing a Cloudian HyperStore architecture. The primary
More informationHP and NX. Introduction. What type of application is NX?
HP and NX Introduction The purpose of this document is to provide information that will aid in selection of HP Workstations for running Siemens PLMS NX. A performance study was completed by benchmarking
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationA Case Study in Optimizing GNU Radio s ATSC Flowgraph
A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%
More informationUsing Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD
More informationTwo hours - online. The exam will be taken on line. This paper version is made available as a backup
COMP 25212 Two hours - online The exam will be taken on line. This paper version is made available as a backup UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date: Monday 21st
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationCreating Storage Class Persistent Memory With NVDIMM
Creating Storage Class Persistent Memory With NVDIMM PAUL SWEERE Vice President, Engineering paul.sweere@vikingtechnology.com MEMORY/STORAGE HIERARCHY Data-Intensive Applications Need Fast Access To Storage
More informationGeneral-purpose computing on graphics processing units (GPGPU)
General-purpose computing on graphics processing units (GPGPU) Thomas Ægidiussen Jensen Henrik Anker Rasmussen François Rosé November 1, 2010 Table of Contents Introduction CUDA CUDA Programming Kernels
More informationReal-Time Support for GPU. GPU Management Heechul Yun
Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks
More informationSurvey on Heterogeneous Computing Paradigms
Survey on Heterogeneous Computing Paradigms Rohit R. Khamitkar PG Student, Dept. of Computer Science and Engineering R.V. College of Engineering Bangalore, India rohitrk.10@gmail.com Abstract Nowadays
More informationINSIDE EXPLORER HARDWARE CATALOGUE MAKING THE INSIDE VISIBLE - AN INTUITIVE AND INTERACTIVE 3D LEARNING EXPERIENCE V 1.
INSIDE EXPLORER HARDWARE CATALOGUE V 1.0 APRIL 2017 MAKING THE INSIDE VISIBLE - AN INTUITIVE AND INTERACTIVE 3D LEARNING EXPERIENCE www.interspectral.com INSIDE EXPLORER HARDWARE Inside explorer is delivered
More informationOptimizing Energy Consumption and Parallel Performance for Static and Dynamic Betweenness Centrality using GPUs Adam McLaughlin, Jason Riedy, and
Optimizing Energy Consumption and Parallel Performance for Static and Dynamic Betweenness Centrality using GPUs Adam McLaughlin, Jason Riedy, and David A. Bader Motivation Real world graphs are challenging
More informationLeapfrog Central 1.2. Technical release notes
Leapfrog Central 1.2 Technical release notes This document outlines the major features and improvements in Leapfrog Central 1.2 which include some changes in Leapfrog Geo and Leapfrog Browser. More detailed
More informationCustom Built Desktops
Custom Built Desktops Home & Business H100-299 H200-349 H300-429 H400-519 H500-599 Processor Intel Celeron Dual G1840 2.8Ghz Motherboard ASRock H81-HDS USB 3.0 RAM 2GB DDR3 RAM Hard Drive 500GB HDD Graphics
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCS/ECE 217. GPU Architecture and Parallel Programming. Lecture 16: GPU within a computing system
CS/ECE 217 GPU Architecture and Parallel Programming Lecture 16: GPU within a computing system Objective To understand the major factors that dictate performance when using GPU as an compute co-processor
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationSPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs. Shai Bergman Tanya Brokhman Tzachi Cohen Mark Silberstein
: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and s Shai Bergman Tanya Brokhman Tzachi Cohen Mark Silberstein What do we do? Enable efficient file I/O for s Why? Support diverse
More informationMULTIMEDIA PROCESSING ON MANY-CORE TECHNOLOGIES USING DISTRIBUTED MULTIMEDIA MIDDLEWARE
MULTIMEDIA PROCESSING ON MANY-CORE TECHNOLOGIES USING DISTRIBUTED MULTIMEDIA MIDDLEWARE Michael Repplinger 1,2, Martin Beyer 1, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken,
More informationad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors Weifeng Liu and Brian Vinter Niels Bohr Institute University of Copenhagen Denmark {weifeng, vinter}@nbi.dk March 1, 2014 Weifeng
More informationECE 571 Advanced Microprocessor-Based Design Lecture 20
ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi
More informationFtp Windows 7 64 Bit Requirements Recommendations Memory
Ftp Windows 7 64 Bit Requirements Recommendations Memory RAM. 8 GB or more memory as required by application. *Use Windows 7 64-bit or Mac OS X 10.8 or higher if your computer is equipped with 4 GB of
More informationEleos: Exit-Less OS Services for SGX Enclaves
Eleos: Exit-Less OS Services for SGX Enclaves Meni Orenbach Marina Minkin Pavel Lifshits Mark Silberstein Accelerated Computing Systems Lab Haifa, Israel What do we do? Improve performance: I/O intensive
More informationGPUfs: Integrating a file system with GPUs
ASPLOS 2013 GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications
More informationCSE 451: Operating Systems Winter Module 2 Architectural Support for Operating Systems
CSE 451: Operating Systems Winter 2017 Module 2 Architectural Support for Operating Systems Mark Zbikowski mzbik@cs.washington.edu 476 Allen Center 2013 Gribble, Lazowska, Levy, Zahorjan 1 Even coarse
More informationATS-GPU Real Time Signal Processing Software
Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional
More informationEven coarse architectural trends impact tremendously the design of systems
CSE 451: Operating Systems Winter 2015 Module 2 Architectural Support for Operating Systems Mark Zbikowski mzbik@cs.washington.edu 476 Allen Center 2013 Gribble, Lazowska, Levy, Zahorjan 1 Even coarse
More informationPaperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper
Architecture Overview Copyright 2016 Paperspace, Co. All Rights Reserved June - 1-2017 Technical Whitepaper Paperspace Whitepaper: Architecture Overview Content 1. Overview 3 2. Virtualization 3 Xen Hypervisor
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 22 File Systems Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Disk Structure Disk can
More informationMicrosoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage
Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS
More informationEven coarse architectural trends impact tremendously the design of systems. Even coarse architectural trends impact tremendously the design of systems
CSE 451: Operating Systems Spring 2013 Module 2 Architectural Support for Operating Systems Ed Lazowska lazowska@cs.washington.edu 570 Allen Center Even coarse architectural trends impact tremendously
More informationSupport Tools for Porting Legacy Applications to Multicore. Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura
Support Tools for Porting Legacy Applications to Multicore Natsuki Kawai, Yuri Ardila, Takashi Nakamura, Yosuke Tamura Agenda Introduction PEMAP: Performance Estimator for MAny core Processors The overview
More informationMaximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD
Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationEvaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi
Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
More informationFEMAP/NX NASTRAN PERFORMANCE TUNING
FEMAP/NX NASTRAN PERFORMANCE TUNING Chris Teague - Saratech (949) 481-3267 www.saratechinc.com NX Nastran Hardware Performance History Running Nastran in 1984: Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More information