Microsecond Latency, Real-Time, Multi-Input/Output Control using GPU Processing
|
|
- Deirdre Simon
- 5 years ago
- Views:
Transcription
1 Microsecond Latency, Real-Time, Multi-Input/Output Control using GPU Processing Nikolaus Rath March 20th, 2013 N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
2 Outline 1 Motivation 2 GPU Control System Architecture 3 Performance N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
3 Outline 1 Motivation 2 GPU Control System Architecture 3 Performance N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
4 Fusion keeps the Sun Burning Nuclear fusion is the process that keeps the sun burning. Very hot hydrogen atoms (the plasma ) collide to form helium, releasing lots of energy Would be great to replicate this on earth. Plenty of fuel available, and no risk of nuclear meltdown. Challenges: heat things to millions of degrees (not so hard), and keep them confined (very hard) 2 H 3 H 4 He MeV n MeV N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
5 At Millions of Degrees, Small Plasmas Evaporate Away N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
6 Magnetic Fields Constrain Plasma Movement to One Dimension N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
7 Closed Magnetic Fields Can Confine Plasmas N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
8 Tokamaks Confine Plasmas Using Magnetic Fields Orange, Magenta, Green: magnetic field generating coils Violet: plasma; Blue: single magnetic field line (example) 1 meter radius, 1 million C, Ampere current N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
9 Self Generated Fields Cause Instabilities Electric currents (which generate magnetic fields) flow not just in the coils, but also in the plasma itself The plasma thus modifies the fields that confine it... sometimes in a self-amplifying way instability Typical shape: rotating, helical deformation. Timescale: 50 microseconds. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
10 Only High-Speed Feedback Control Can Preserve Confinement Sensors detect deformations due to plasma currents Control coils dynamically push back feedback control N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
11 Outline 1 Motivation 2 GPU Control System Architecture 3 Performance N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
12 Real-Time Performance is Determined By Latency and Sampling Period latency sampling period S GPU Processing Pipelines S S S S sample paket Digitizer S S S S S Analog Output Latency is response time of feedback system Sampling period determines smoothness Algorithmic complexity limits latency, not sampling period Need both latency and sampling period in the order of few microseconds N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
13 Control Algorithm is Implemented in One Kernel CPU GPU CPU GPU Read input data Send parameters to GPU memory Process data Start GPU kernel Read data Send data to GPU memory Process data Start GPU kernel A Compute result a Compute result a Wait for GPU kernel A Process results Read results from Compute GPU Memory result b Process results... Send new data to Write output data GPU memory Start GPU kernel B Wait for GPU kernel Compute result b Wait for GPU kernel B Read results from GPU Memory Write output data Time N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
14 Redundant PCIe Transfers have to be Avoided To Reduce Latency Traditional Data bounces through host RAM PCIe bus has multi GB/s throughput Transfer setup takes several µs Okay if data chunks are big, transfer and processing takes long Bad if latency is longer than transfer time N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
15 Redundant PCIe Transfers have to be Avoided To Reduce Latency New Peer-to-peer transfers eliminate need for bounce buffer Good performance even for small amounts of data Can be implemented in software (kernel) Required peer-to-peer capable root-complex present in most midto high-end mainboards. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
16 Peer-to-peer PCIe transfers are set up by sharing BARs GPU GPU Memory A/D Module D/A Module BARs 0x01 0x02 0x03 DMA Controller BARs 0x05 0x06 0x03 DMA Controller BARs 0x08 0x09 0x01 writes reads Initialized from BIOS by CPU PCIe devices communicate via BARs in the PCI address space GPU can map part of its memory into a BAR AD/DA modules can transfer to/from arbitrary PCI address CPU establishes communication by telling AD/DA modules about GPU BAR. Required some trickery in the past, but with CUDA 5 now officially supported. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
17 Example: Userspace /* Allocate buffer with extra space for 64kb alignment */ CUdeviceptr dev_addr; cumemalloc(&dev_addr, size + 0xFFFF); /* Prepare mapping */ CUDA_POINTER_ATTRIBUTE_P2P_TOKENS tokens; cupointergetattribute(&tokens, CU_POINTER_ATTRIBUTE_P2P_TOKENS, dev_addr); /* Align to 64kb */ dev_addr += 0xFFFF; dev_addr &= ~0xFFFF; /* Call custom kernel module to get bus address, refers to open device file */ struct rdma_info s; s.dev_addr = dev_addr; s.p2ptoken = tokens.p2ptoken; s.vaspacetoken = tokens.vaspacetoken; s.size = size; ioctl(fd, RDMA_TRANSLATE_TOKEN, &s) N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
18 Example: Kernelspace long rtm_t_dma_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { nvidia_p2p_page_table_t *page_table; //... switch(cmd){ case RDMA_TRANSLATE_TOKEN: { } COPY_FROM_USER(&rdma_info, varg, sizeof(struct rdma_info)); nvidia_p2p_get_pages(rdma_info.p2ptoken, rdma_info.vaspacetoken, rdma_info.dev_addr, rdma_info.size, &page_table, rdma_free_callback, tdev); rdma_info.bus_addr = page_table->pages[0]->physical_address; COPY_TO_USER(varg, &rdma_inf, sizeof(struct rdma_info)); return 0; } // Other ioctls N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
19 Userspace Continued /* Call custom kernel module to get bus address, refers to open device file */ rtm_t_rdma_info s; s.dev_addr = dev_addr; ioctl(fd, RTM_T_TRANSLATE_TOKEN, &s) /* Retrieve bus address */ uint64_t bus_addr; bus_addr = s.bus_addr; /* Send bus address to digitizer */ init_rtm_t(bus_addr, other, stuff, here); // Start GPU kernel // Kernel polls input buffer // Wait for kernel to complete N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
20 Outline 1 Motivation 2 GPU Control System Architecture 3 Performance N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
21 The HBT-EP Plasma Control System was Built with Commodity Hardware. Hardware: Workstation PC NVIDIA GeForce GTX 580 D-TACQ ACQ196 A-D Converter (96 channels, 16 bit) 2 D-TACQ AO32CPCI D-A Converter (2 x 32 channels, 16 bit) Standard Linux host system (no real-time kernel required!) N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
22 P2P Transfers Reduce Latency by 50% Latency [us] GPU RAM Host RAM Sampling Period [us] Optimal latency when using host memory: 16 µs Optimal latency when using GPU memory: 10 µs 50% difference does not mean having to wait twice as long, it is the difference between things blowing up or not. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
23 GPU Beats CPU in Computational and Real-Time Performance even in the Microsecond Regime Performance tested with repeated matrix application GPU beats CPU down to 5 µs Missed samples counted in 1000 runs Missed samples with GPU: None, with CPU: up to 2.5% Sampling Period [us] Count GPU CPU Matrix Size CPU GPU Missed Samples [%] N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
24 Summary 1 The advantages of GPUs are not restricted to large problems requiring long calculations. 2 Even when processing kb sized batches under microsecond latency constraints, GPUs can be faster than CPUs, while at the same time offering better real-time performance. 3 In these regimes, data transfer overhead becomes the dominating factor, and using peer to peer transfers improves performance by more than 50%. 4 A GPU based real-time control system has been developed at Columbia University and tested for the control of magnetically confined plasmas. Contact us for details. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 23
25 Outline 4 Backup Slides N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 6
26 Latency and Sampling Period are Measured Experimentally by Copying Square Waves Volt A Time [us] B Shot Control Input Control Output Sample Clock Control algorithm set up to copy input to output 1:1 Blue trace is input square wave Green trace is output square wave Output lags behind input by control system latency Red trace is sampling interval (sampling on downward edge) N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 6
27 Plasma Physics Results: Dominant Mode Amplitude Reduced by up to 60% 0.24 No FB g=144 g=577 Amplitude Frequency [khz] N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 6
28 Self Generated Fields Cause Instabilities Electric currents (which generate magnetic fields) flow not just in the coils, but also in the plasma itself The plasma thus modifies the fields that confine it... sometimes in a self-amplifying way instability Typical shape: rotating, helical deformation. Timescale: 50 microseconds. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 6
29 Feedback Control uses Measurements to Determine Control Signals Input Controller Control Signal / Control Output Actuators Physical Interaction System Output Physical Interaction Measurements / Control Input Sensors Goal: keep system in specific state If system is perfectly known, can calculate required control signals (open-loop control) In practice, need to use measurements to determine effects and update signals: feedback control A control system acquires measurements, performs computations, and generates control output to manipulate the system state. N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 6
30 Data Passthrough Establishes 8 µs Lower Latency Limit Latency [us] GPU RAM Host RAM Sampling Period [us] Control system uses same buffer to write input and read output No GPU processing, so no difference between host and GPU memory Jump: 4 µs required for A-D conversion and data push Offset: 4 µs required for data pull and D-A conversion N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, / 6
Abstract. * Supported by U.S. D.O.E. Grant DE-FG02-96ER M.W. Bongard, APS-DPP, Denver, CO, October 2005
Abstract The Phase II PEGASUS ST experiment includes fully programmable power supplies for all magnet coils. These will be integrated with a digital feedback plasma control system (PCS), based on the PCS
More informationA new architecture for real-time control in RFX-mod G. Manduchi, A. Barbalace Big Physics Symposium 1/16
A new architecture for real-time control in RFX-mod G. Manduchi, A. Barbalace 2011 Big Physics Symposium 1/16 Current RFX control system MHD mode control Plasma position control Toroidal field control
More informationSpring 2017 :: CSE 506. Device Programming. Nima Honarmand
Device Programming Nima Honarmand read/write interrupt read/write Spring 2017 :: CSE 506 Device Interface (Logical View) Device Interface Components: Device registers Device Memory DMA buffers Interrupt
More informationInput / Output. Kevin Webb Swarthmore College April 12, 2018
Input / Output Kevin Webb Swarthmore College April 12, 2018 xkcd #927 Fortunately, the charging one has been solved now that we've all standardized on mini-usb. Or is it micro-usb? Today s Goals Characterize
More informationStorage. Hwansoo Han
Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics
More informationAsynchronous Peer-to-Peer Device Communication
13th ANNUAL WORKSHOP 2017 Asynchronous Peer-to-Peer Device Communication Feras Daoud, Leon Romanovsky [ 28 March, 2017 ] Agenda Peer-to-Peer communication PeerDirect technology PeerDirect and PeerDirect
More informationI/O Management Intro. Chapter 5
I/O Management Intro Chapter 5 1 Learning Outcomes A high-level understanding of the properties of a variety of I/O devices. An understanding of methods of interacting with I/O devices. 2 I/O Devices There
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationI/O Systems (3): Clocks and Timers. CSE 2431: Introduction to Operating Systems
I/O Systems (3): Clocks and Timers CSE 2431: Introduction to Operating Systems 1 Outline Clock Hardware Clock Software Soft Timers 2 Two Types of Clocks Simple clock: tied to the 110- or 220-volt power
More informationCUDA OPTIMIZATIONS ISC 2011 Tutorial
CUDA OPTIMIZATIONS ISC 2011 Tutorial Tim C. Schroeder, NVIDIA Corporation Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationModule 11: I/O Systems
Module 11: I/O Systems Reading: Chapter 13 Objectives Explore the structure of the operating system s I/O subsystem. Discuss the principles of I/O hardware and its complexity. Provide details on the performance
More informationOperating Systems. File Systems. Thomas Ropars.
1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating
More informationEN1640: Design of Computing Systems Topic 07: I/O
EN1640: Design of Computing Systems Topic 07: I/O Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring 2017 [ material
More informationAccelerating Storage with NVM Express SSDs and P2PDMA Stephen Bates, PhD Chief Technology Officer
Accelerating Storage with NVM Express SSDs and P2PDMA Stephen Bates, PhD Chief Technology Officer 2018 Storage Developer Conference. Eidetic Communications Inc. All Rights Reserved. 1 Outline Motivation
More informationI/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)
I/O Devices Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Hardware Support for I/O CPU RAM Network Card Graphics Card Memory Bus General I/O Bus (e.g., PCI) Canonical Device OS reads/writes
More informationECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017
ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 Input/Output (IO) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke) IO:
More informationPredictive Runtime Code Scheduling for Heterogeneous Architectures
Predictive Runtime Code Scheduling for Heterogeneous Architectures Víctor Jiménez, Lluís Vilanova, Isaac Gelado Marisa Gil, Grigori Fursin, Nacho Navarro HiPEAC 2009 January, 26th, 2009 1 Outline Motivation
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationEfficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory
Institute of Computational Science Efficient CPU GPU data transfers CUDA 6.0 Unified Virtual Memory Juraj Kardoš (University of Lugano) July 9, 2014 Juraj Kardoš Efficient GPU data transfers July 9, 2014
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More informationAlcator C-Mod Digital Plasma Control System Presented by: S. Wolfe, J. Stillerman, M. Ferrara, T. Fredian, I. Hutchinson
C-Mod Digital Plasma Control System Presented by: S. Wolfe, J. Stillerman, M. Ferrara, T. Fredian, I. Hutchinson APS DPP05 Denver, CO Oct. 26, 2005 Abstract A new digital plasma control system (DPCS) has
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationAccelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors
Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte
More informationExtreme Storage Performance with exflash DIMM and AMPS
Extreme Storage Performance with exflash DIMM and AMPS 214 by 6East Technologies, Inc. and Lenovo Corporation All trademarks or registered trademarks mentioned here are the property of their respective
More informationECEN 449 Microprocessor System Design. Hardware-Software Communication. Texas A&M University
ECEN 449 Microprocessor System Design Hardware-Software Communication 1 Objectives of this Lecture Unit Learn basics of Hardware-Software communication Memory Mapped I/O Polling/Interrupts 2 Motivation
More informationDEVELOPING A LINUX KERNEL MODULE USING RDMA FOR GPUDIRECT
DEVELOPING A LINUX KERNEL MODULE USING RDMA FOR GPUDIRECT TB-06712-001 _v8.0 September 2016 Application Guide TABLE OF CONTENTS Chapter 1. Overview... 1 1.1. How RDMA Works...2 1.2. Standard DMA Transfer...2
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationImplementation of the Pegasus Digital Plasma Control System
Implementation of the Pegasus Digital Plasma Control System M.W. Bongard, D.J. Battaglia, R.J. Fonck, G.D. Garstka, B.T. Lewicki, B.J. Squires, E.A. Unterberg Abstract A primary goal of the Phase II PEGASUS
More informationPC-based data acquisition I
FYS3240 PC-based instrumentation and microcontrollers PC-based data acquisition I Spring 2016 Lecture #8 Bekkeng, 20.01.2016 General-purpose computer With a Personal Computer (PC) we mean a general-purpose
More informationSPIN: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and GPUs. Shai Bergman Tanya Brokhman Tzachi Cohen Mark Silberstein
: Seamless Operating System Integration of Peer-to-Peer DMA Between SSDs and s Shai Bergman Tanya Brokhman Tzachi Cohen Mark Silberstein What do we do? Enable efficient file I/O for s Why? Support diverse
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Building systems with GPUs is hard. Why? 2 Goal of
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationBest Practices for Deploying and Managing GPU Clusters
Best Practices for Deploying and Managing GPU Clusters Dale Southard, NVIDIA dsouthard@nvidia.com About the Speaker and You [Dale] is a senior solution architect with NVIDIA (I fix things). I primarily
More information[08] IO SUBSYSTEM 1. 1
[08] IO SUBSYSTEM 1. 1 OUTLINE Input/Output (IO) Hardware Device Classes OS Interfaces Performing IO Polled Mode Interrupt Driven Blocking vs Non-blocking Handling IO Buffering & Strategies Other Issues
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationGPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34
1 / 34 GPU Programming Lecture 2: CUDA C Basics Miaoqing Huang University of Arkansas 2 / 34 Outline Evolvements of NVIDIA GPU CUDA Basic Detailed Steps Device Memories and Data Transfer Kernel Functions
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationI/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran
More informationComplexity and Advanced Algorithms. Introduction to Parallel Algorithms
Complexity and Advanced Algorithms Introduction to Parallel Algorithms Why Parallel Computing? Save time, resources, memory,... Who is using it? Academia Industry Government Individuals? Two practical
More informationRealtime Signal Processing on Embedded GPUs
Realtime Signal Processing on Embedded s Dr. Matthias Rosenthal Armin Weiss Dr. Amin Mazloumian Institute of Embedded Systems Realtime Platforms Research Group Zurich University of Applied Sciences Motivation
More informationCDA3101 Recitation Section 13
CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationVirtual Memory. Chapter 8
Chapter 8 Virtual Memory What are common with paging and segmentation are that all memory addresses within a process are logical ones that can be dynamically translated into physical addresses at run time.
More informationChapter 13: I/O Systems. Operating System Concepts 9 th Edition
Chapter 13: I/O Systems Silberschatz, Galvin and Gagne 2013 Chapter 13: I/O Systems Overview I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations
More informationCSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware
CSE 120 July 27, 2006 Day 8 Input/Output Instructor: Neil Rhodes How hardware works Operating Systems Layer What the kernel does API What the programmer does Overview 2 Kinds Block devices: read/write
More informationUsing Time Division Multiplexing to support Real-time Networking on Ethernet
Using Time Division Multiplexing to support Real-time Networking on Ethernet Hariprasad Sampathkumar 25 th January 2005 Master s Thesis Defense Committee Dr. Douglas Niehaus, Chair Dr. Jeremiah James,
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationStudying GPU based RTC for TMT NFIRAOS
Studying GPU based RTC for TMT NFIRAOS Lianqi Wang Thirty Meter Telescope Project RTC Workshop Dec 04, 2012 1 Outline Tomography with iterative algorithms on GPUs Matri vector multiply approach Assembling
More informationData Storage and Query Answering. Data Storage and Disk Structure (2)
Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400
More informationWindowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationCS/ECE 217. GPU Architecture and Parallel Programming. Lecture 16: GPU within a computing system
CS/ECE 217 GPU Architecture and Parallel Programming Lecture 16: GPU within a computing system Objective To understand the major factors that dictate performance when using GPU as an compute co-processor
More informationI/O Systems. Jo, Heeseung
I/O Systems Jo, Heeseung Today's Topics Device characteristics Block device vs. Character device Direct I/O vs. Memory-mapped I/O Polling vs. Interrupts Programmed I/O vs. DMA Blocking vs. Non-blocking
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationATS-GPU Real Time Signal Processing Software
Transfer A/D data to at high speed Up to 4 GB/s transfer rate for PCIe Gen 3 digitizer boards Supports CUDA compute capability 2.0+ Designed to work with AlazarTech PCI Express waveform digitizers Optional
More informationComputer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine
More informationAdvanced NI-DAQmx Programming Techniques with LabVIEW
Advanced NI-DAQmx Programming Techniques with LabVIEW Agenda Understanding Your Hardware Data Acquisition Systems Data Acquisition Device Subsystems Advanced Programming with NI-DAQmx Understanding Your
More informationCSE380 - Operating Systems. Communicating with Devices
CSE380 - Operating Systems Notes for Lecture 15-11/4/04 Matt Blaze (some examples by Insup Lee) Communicating with Devices Modern architectures support convenient communication with devices memory mapped
More informationCS330: Operating System and Lab. (Spring 2006) I/O Systems
CS330: Operating System and Lab. (Spring 2006) I/O Systems Today s Topics Block device vs. Character device Direct I/O vs. Memory-mapped I/O Polling vs. Interrupts Programmed I/O vs. DMA Blocking vs. Non-blocking
More informationEE , GPU Programming
EE 4702-1, GPU Programming When / Where Here (1218 Patrick F. Taylor Hall), MWF 11:30-12:20 Fall 2017 http://www.ece.lsu.edu/koppel/gpup/ Offered By David M. Koppelman Room 3316R Patrick F. Taylor Hall
More informationKey Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs
IO 1 Today IO 2 Key Points CPU interface and interaction with IO IO devices The basic structure of the IO system (north bridge, south bridge, etc.) The key advantages of high speed serial lines. The benefits
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationGetting Connected (Chapter 2 Part 4) Networking CS 3470, Section 1 Sarah Diesburg
Getting Connected (Chapter 2 Part 4) Networking CS 3470, Section 1 Sarah Diesburg Five Problems Encoding/decoding Framing Error Detection Error Correction Media Access Five Problems Encoding/decoding Framing
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationReview: Hardware user/kernel boundary
Review: Hardware user/kernel boundary applic. applic. applic. user lib lib lib kernel syscall pg fault syscall FS VM sockets disk disk NIC context switch TCP retransmits,... device interrupts Processor
More informationNew Development of EPICS-based Data Acquisition System for Millimeter-wave Interferometer in KSTAR Tokamak
October 10-14, 2011 Grenoble, France New Development of EPICS-based Data Acquisition System for Millimeter-wave Interferometer in KSTAR Tokamak October 11, 2011, Taegu Lee KSTAR Research Center 2 Outlines
More informationCUDA (Compute Unified Device Architecture)
CUDA (Compute Unified Device Architecture) Mike Bailey History of GPU Performance vs. CPU Performance GFLOPS Source: NVIDIA G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce
More informationComputer Systems Laboratory Sungkyunkwan University
I/O System Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction (1) I/O devices can be characterized by Behavior: input, output, storage
More informationMarine Acoustic Acquisition System
Omiga Technology Ltd was founded in 2000 providing bespoke software and hardware solutions for high speed data acquisition systems and data analysis. The majority of solutions provided are based on National
More informationAcquisition of experimental data
Otto-von-Guericke-Univ. Magdeburg Vorlesung «Messtechnik» Acquisition of experimental data Dominique Thévenin, Katja Zähringer Lehrstuhl für Strömungsmechanik und Strömungstechnik (LSS) thevenin@ovgu.de,
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationMethods to protect proprietary components in device drivers
Methods to protect proprietary components in device drivers Matt Porter Embedded Alley Solutions, Inc. Introduction Why the interest in closed drivers on Linux? Competition Advantage perception Upsell
More informationImportant new NVMe features for optimizing the data pipeline
Important new NVMe features for optimizing the data pipeline Dr. Stephen Bates, CTO Eideticom Santa Clara, CA 1 Outline Intro to NVMe Controller Memory Buffers (CMBs) Use cases for CMBs Submission Queue
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationNUMA replicated pagecache for Linux
NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations
More informationOperating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group
Operating Systems (2INC0) 20/19 Introduction (01) Dr. Courtesy of Prof. Dr. Johan Lukkien System Architecture and Networking Group Course Overview Introduction to operating systems Processes, threads and
More informationMemories: Memory Technology
Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationTriton file systems - an introduction. slide 1 of 28
Triton file systems - an introduction slide 1 of 28 File systems Motivation & basic concepts Storage locations Basic flow of IO Do's and Don'ts Exercises slide 2 of 28 File systems: Motivation Case #1:
More informationDevices and Device Controllers. secondary storage (disks, tape) and storage controllers
I/O 1 Devices and Device Controllers network interface graphics adapter secondary storage (disks, tape) and storage controllers serial (e.g., mouse, keyboard) sound co-processors... I/O 2 Bus Architecture
More informationBus Architecture Example
I/O 1 network interface graphics adapter Devices and Device Controllers secondary storage (disks, tape) and storage controllers serial (e.g., mouse, keyboard) sound co-processors... I/O 2 Bus Architecture
More informationInteraction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang
4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Interaction of Fluid Simulation Based on PhysX Physics Engine Huibai Wang, Jianfei Wan, Fengquan Zhang College
More informationEfficient Data Transfers
Efficient Data fers Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2016 PCIE Review Typical Structure of a CUDA Program Global variables declaration Function prototypes global
More informationUsing Ethernet for real-time communication in a Nuclear Fusion Experiment
Using Ethernet for real-time communication in a Nuclear Fusion Experiment A. Luchetta, G. Manduchi, C. Taliercio Consorzio RFX Euratom-ENEA Association Corso Stati Uniti 4, 35127 Padova, Italy Gabriele
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole OS-Related Hardware & Software 2 Lecture 2 Overview OS-Related Hardware & Software - complications in real systems - brief introduction to memory protection,
More informationRemote Persistent Memory With Nothing But Net Tom Talpey Microsoft
Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft 1 Outline Aspiration RDMA NIC as a Persistent Memory storage adapter Steps to there: Flush Write-after-flush Integrity Privacy QoS Some
More informationChapter 6. Storage and Other I/O Topics
Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections
More information... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture.
PCI Express System Interconnect Software Architecture Application Note AN-531 Introduction By Kwok Kong A multi-peer system using a standard-based PCI Express (PCIe ) multi-port switch as the system interconnect
More informationDemystifying Network Cards
Demystifying Network Cards Paul Emmerich December 27, 2017 Chair of Network Architectures and Services About me PhD student at Researching performance of software packet processing systems Mostly working
More informationV. Primary & Secondary Memory!
V. Primary & Secondary Memory! Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM)
More informationComputer Science 61C Spring Friedland and Weaver. Input/Output
Input/Output 1 A Computer is Useless without I/O I/O handles persistent storage Disks, SSD memory, etc I/O handles user interfaces Keyboard/mouse/display I/O handles network 2 Basic I/O: Devices are Memory
More informationFilesystem. Disclaimer: some slides are adopted from book authors slides with permission
Filesystem Disclaimer: some slides are adopted from book authors slides with permission 1 Recap Directory A special file contains (inode, filename) mappings Caching Directory cache Accelerate to find inode
More informationImmersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories
Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories J. Stone, K. Vandivort, K. Schulten Theoretical and Computational Biophysics Group Beckman Institute
More informationJohn W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands
Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More information