S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION. Venugopala Madumbu, NVIDIA GTC D
|
|
- Easter Conley
- 6 years ago
- Views:
Transcription
1 S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION Venugopala Madumbu, NVIDIA GTC D
2 ADVANCED DRIVING ASSIST SYSTEMS (ADAS) & AUTONOMOUS DRIVING (AD) High Compute Workloads Mapped to GPU 2
3 ADAS/AD Requirements & Challenges Real-Time Behavior Determinism Freedom from Interference Performance Maximum Throughput Minimal Latency Priority of Functionalities Multi-Core CPU GPU/DSP/HWA 3
4 ADAS/AD WORKLOADS Challenges Illustrated Scenario#1 Standalone Exec GL Workload X msec Scenario#2 Standalone Exec CUDA Workload Scenario#3 Concurrent Exec GL Workload CUDA Workload > (X+Y) msec If so, How to Achieve determinism Achieve Freedom from interference Prioritize one Workload over other While also having maximum throughput minimum latency CUDA Workload GL Workload Y msec Time Shared GPU Execution Y msec X msec 4
5 GPU IN TEGRA High Level Tegra SoC Block Diagram CPU Host GPU Engines Other Clients (ISP, Display, etc.) CPU submits job/work to GPU GPU runs asynchronously to CPU GPU Memory Interface Memory Controller GPU has its own hardware scheduler (Host) It switches between workloads without CPU involvement DRAM 5
6 GPU SCHEDULING Concepts Channel independent stream of work on the GPU Command Push Buffer Command buffer written by Software and read by Hardware Channel Switching Save/restore GPU state on a channel switch Semaphores/SyncPoints Synchronization mechanism for events within the GPU Time Slice How long a GPU executes commands of a channel before a channel switch Run-list An ordered list of channels that SW wants the GPU to execute 6
7 GPU SCHEDULING Timesharing by Channel Switching Channel switching occurs when any ONE of the following happens: Time slice expires Engine runs out of work (no more commands) Blocked on a semaphore Channel Switch time = Drain Time + Save/Restore time Preemption can reduce Channel Switch times drastically App1 App2 App3 App4 Timesliced Round-Robin GPU GPU Occupancy Time 7
8 GPU SCHEDULING Preemption 8
9 GPU SCHEDULING Channel Switching with Time Slice Scenarios Channel 1 Time slice Channel Channel 1 1 Time slice Channel Channel 1 1 Time slice Channel Switch Timeout Channel Switch Timeout 1. Channel finishes before time slice expires Context switch to next channel Channel Reset 2. Channel preemption Stop all commands in pipeline Wait for engines to idle Higher Context Switch time 3. Channel Reset Engine could not idle and context could not save before channel switch timeout Callback to notify kernel of channel reset event 9
10 CHALLENGE REVISTED How can we achieve both? Real-Time behavior: Determinism Freedom from Interference Priority of Functionalities Performance: Maximum Throughput Minimal Latency 10
11 GPU SYNCHRONIZATION & SCHEDULING Software Control 1. User Driver Level (GPU Synchronization Approach) Syncpoints/Semaphores for Synchronization Through EglStreams, EGLSync etc 2. Kernel Driver Level (GPU Priority Scheduling Approach) Run-List Engineering How long channel runs Order of Channel execution 11
12 GPU SYNCHRONIZATION APPROACH No Synchronization Case Kernel launch GPU Semaphore GPU Priority GPU Task GPU Task Latency due to Concurrent Execution GPU Task CPU CPU Task CPU Task CPU Task msec 12
13 GPU SYNCHRONIZATION APPROACH Synchronization on CPU: Not good for GPU Kernel launch GPU Semaphore GPU Priority GPU Task GPU Task GPU Task CPU CPU Task CPU Task CPU Task msec 13
14 GPU SYNCHRONIZATION APPROACH Synchronization on GPU: No Context Switches Kernel launch GPU Semaphore GPU Priority GPU Task Delayed Start GPU Task GPU Task Determinism Freedom from Interference Priority of Functionalities CPU CPU Task CPU Task CPU Task msec 14
15 GPU PRIORITY SCHEDULING APPROACH Hypothetical Example TASK PRIORITY FPS WORST CASE EXECUTION TIME (WCET) H1 High 60 9ms H1 M1 Medium 30 4ms M1 M2 Medium 30 4ms M2 L1 Low/Best Effort 30 10ms L1 15
16 GPU PRIORITY SCHEDULING APPROACH Engineered Run-list and Time Slice Ensuring FPS and Latency H1 H1 H1 (Max Exec Time = 9 ms) M1 (Max Exec Time = 4 ms) M1 M1 Time slice = 9 ms Time slice = 3 ms M2 M2 L1 M2 (Max Exec Time = 4 ms) Time slice = 3 ms L1 (Max Exec Time = 10 ms) Time slice = 1 ms Run-List Ensured not >16ms for 60fps operation Work on GPU Time 16
17 GPU PRIORITY SCHEDULING APPROACH Reduce Latency for GPU Work Completion Ensure timeslice is long enough to complete work Ensure work is continually submitted and also well ahead in time To Avoid GPU idle time Unnecessary context switches 17
18 GPU SCHEDULING Best Practices to Keep GPU Busy Submit work in advance So the GPU has some work to execute at any point of time Try to reduce/eliminate work dependencies Have contingency plan for work overload If feedback shows over budget, submit work few frames ahead and spread Plan for worst case scenario Deal with GPU reset case esp for the Low priority cases GL Robustness Extensions 18
19 CONCLUSION GPU Synchronization & Scheduling Approaches Real-Time behavior: Determinism Freedom from Interference Priority of Functionalities Performance: Maximum Throughput Minimal Latency 19
20 Scott Whitman, NVIDIA Vladislav Buzov, NVIDIA Amit Rao, NVIDIA Yogesh Kini, NVIDIA ACKNOWLEDGEMENTS GTC Instructor led Lab:: L7105 EGLSTREAMS : INTEROPERABILITY OF CAMERA, CUDA AND OPENGL 11 TH MAY :30-11:30AM LL21D 20
21 Q&A 21
22 THANK YOU
EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL. Debalina Bhattacharjee Sharan Ashwathnarayan
53023 - EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL Debalina Bhattacharjee Sharan Ashwathnarayan Tegra SOC and typical use-cases Why Interops EGLStream and Its Key Features Agenda Examples
More informationRunning Multiple Workloads on a GPU A UX Oriented Approach. Yuval Sarna Graphics Software GameFly Streaming
Running Multiple Workloads on a GPU A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming Agenda Sharing the GPU We all like to Play Introduction to GPU Scheduling Proposed GPU
More informationComp 310 Computer Systems and Organization
Comp 310 Computer Systems and Organization Lecture #9 Process Management (CPU Scheduling) 1 Prof. Joseph Vybihal Announcements Oct 16 Midterm exam (in class) In class review Oct 14 (½ class review) Ass#2
More informationDeadline-based Scheduling for GPU with Preemption Support
Deadline-based Scheduling for GPU with Preemption Support N. Capodieci, R. Cavicchioli, M. Bertogna, A. Paramakuru. University of Modena and Reggio Emilia NVIDIA Corp. 12/12/2018 RTSS 2018, NASHVILLE 1
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationAperiodic Servers (Issues)
Aperiodic Servers (Issues) Interference Causes more interference than simple periodic tasks Increased context switching Cache interference Accounting Context switch time Again, possibly more context switches
More informationChapter 6: CPU Scheduling. Operating System Concepts 9 th Edition
Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time
More informationBuilding Real-Time Professional Visualization Solutions on GPUs. Kristof Denolf Samuel Maroy Ronny Dewaele
Building Real-Time Professional Visualization Solutions on GPUs Kristof Denolf Samuel Maroy Ronny Dewaele Page 2 Outline Barco s professional visualization solutions The need for performance portability
More informationSync Points in the Intel Gfx Driver. Jesse Barnes Intel Open Source Technology Center
Sync Points in the Intel Gfx Driver Jesse Barnes Intel Open Source Technology Center 1 Agenda History and other implementations Other I/O layers - block device ordering NV_fence, ARB_sync EGL_native_fence_sync,
More informationS CUDA on Xavier
S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000
More informationGTC Interaction Simplified. Gesture Recognition Everywhere: Gesture Solutions on Tegra
GTC 2013 Interaction Simplified Gesture Recognition Everywhere: Gesture Solutions on Tegra eyesight at a Glance Touch-free technology providing an enhanced user experience. Easy and intuitive control
More informationNVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016
NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING September 13, 2016 AI FOR AUTONOMOUS DRIVING MAPPING KALDI LOCALIZATION DRIVENET Training on DGX-1 NVIDIA DGX-1 NVIDIA DRIVE PX 2 Driving with DriveWorks
More informationCS2506 Quick Revision
CS2506 Quick Revision OS Structure / Layer Kernel Structure Enter Kernel / Trap Instruction Classification of OS Process Definition Process Context Operations Process Management Child Process Thread Process
More informationBenchmarking Real-World In-Vehicle Applications
Benchmarking Real-World In-Vehicle Applications NVIDIA GTC 2015-03-18 m y c a b l e GmbH Michael Carstens-Behrens Gartenstraße 10 24534 Neumuenster, Germany +49 4321 559 56-55 +49 4321 559 56-10 mcb@mycable.de
More informationTHE LEADER IN VISUAL COMPUTING
MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning
More informationSLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES
SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ OVERVIEW Motivation Tools of the trade Multi-GPU driver functions Multi-GPU programming functions Multi threaded
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationCUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA
CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0 Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nsight VSE APOD
More informationCOMP 605: Introduction to Parallel Computing Lecture : GPU Architecture
COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Posted:
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationChapter 6: CPU Scheduling. Operating System Concepts 9 th Edition
Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationLECTURE 3:CPU SCHEDULING
LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives
More informationNuttX Realtime Programming
NuttX RTOS NuttX Realtime Programming Gregory Nutt Overview Interrupts Cooperative Scheduling Tasks Work Queues Realtime Schedulers Real Time == == Deterministic Response Latency Stimulus Response Deadline
More informationLecture 5 / Chapter 6 (CPU Scheduling) Basic Concepts. Scheduling Criteria Scheduling Algorithms
Operating System Lecture 5 / Chapter 6 (CPU Scheduling) Basic Concepts Scheduling Criteria Scheduling Algorithms OS Process Review Multicore Programming Multithreading Models Thread Libraries Implicit
More informationMultimedia Systems 2011/2012
Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware
More informationMaximizing Face Detection Performance
Maximizing Face Detection Performance Paulius Micikevicius Developer Technology Engineer, NVIDIA GTC 2015 1 Outline Very brief review of cascaded-classifiers Parallelization choices Reducing the amount
More informationVulkan Timeline Semaphores
Vulkan line Semaphores Jason Ekstrand September 2018 Copyright 2018 The Khronos Group Inc. - Page 1 Current Status of VkSemaphore Current VkSemaphores require a strict signal, wait, signal, wait pattern
More informationCS A331 Programming Language Concepts
CS A331 Programming Language Concepts Lecture 12 Alternative Language Examples (General Concurrency Issues and Concepts) March 30, 2014 Sam Siewert Major Concepts Concurrent Processing Processes, Tasks,
More informationMultiprocessor and Real- Time Scheduling. Chapter 10
Multiprocessor and Real- Time Scheduling Chapter 10 Classifications of Multiprocessor Loosely coupled multiprocessor each processor has its own memory and I/O channels Functionally specialized processors
More informationPreview. Process Scheduler. Process Scheduling Algorithms for Batch System. Process Scheduling Algorithms for Interactive System
Preview Process Scheduler Short Term Scheduler Long Term Scheduler Process Scheduling Algorithms for Batch System First Come First Serve Shortest Job First Shortest Remaining Job First Process Scheduling
More informationUniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling
Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three level scheduling 2 1 Types of Scheduling 3 Long- and Medium-Term Schedulers Long-term scheduler Determines which programs
More informationUniprocessor Scheduling. Chapter 9
Uniprocessor Scheduling Chapter 9 1 Aim of Scheduling Assign processes to be executed by the processor(s) Response time Throughput Processor efficiency 2 3 4 Long-Term Scheduling Determines which programs
More informationDepartment of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW
Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW 2 SO FAR talked about in-kernel building blocks: threads memory IPC drivers
More informationReference Model and Scheduling Policies for Real-Time Systems
ESG Seminar p.1/42 Reference Model and Scheduling Policies for Real-Time Systems Mayank Agarwal and Ankit Mathur Dept. of Computer Science and Engineering, Indian Institute of Technology Delhi ESG Seminar
More informationCMPS 111 Spring 2003 Midterm Exam May 8, Name: ID:
CMPS 111 Spring 2003 Midterm Exam May 8, 2003 Name: ID: This is a closed note, closed book exam. There are 20 multiple choice questions and 5 short answer questions. Plan your time accordingly. Part I:
More informationCS 856 Latency in Communication Systems
CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency
More informationDEEP DIVE INTO DYNAMIC PARALLELISM
April 4-7, 2016 Silicon Valley DEEP DIVE INTO DYNAMIC PARALLELISM SHANKARA RAO THEJASWI NANDITALE, NVIDIA CHRISTOPH ANGERER, NVIDIA 1 OVERVIEW AND INTRODUCTION 2 WHAT IS DYNAMIC PARALLELISM? The ability
More informationCourse Syllabus. Operating Systems
Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling
More informationA Server-based Approach for Predictable GPU Access Control
A Server-based Approach for Predictable GPU Access Control Hyoseung Kim * Pratyush Patel Shige Wang Raj Rajkumar * University of California, Riverside Carnegie Mellon University General Motors R&D Benefits
More informationCSE 153 Design of Operating Systems
CSE 153 Design of Operating Systems Winter 2018 Midterm Review Midterm in class on Monday Covers material through scheduling and deadlock Based upon lecture material and modules of the book indicated on
More informationProject 2: CPU Scheduling Simulator
Project 2: CPU Scheduling Simulator CSCI 442, Spring 2017 Assigned Date: March 2, 2017 Intermediate Deliverable 1 Due: March 10, 2017 @ 11:59pm Intermediate Deliverable 2 Due: March 24, 2017 @ 11:59pm
More informationGstShark profiling: a real-life example. Michael Grüner - David Soto -
GstShark profiling: a real-life example Michael Grüner - michael.gruner@ridgerun.com David Soto - david.soto@ridgerun.com Introduction Michael Grüner Technical Lead at RidgeRun Digital signal processing
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationCUDA OPTIMIZATIONS ISC 2011 Tutorial
CUDA OPTIMIZATIONS ISC 2011 Tutorial Tim C. Schroeder, NVIDIA Corporation Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationShadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies
Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies Alexander Merritt, Vishakha Gupta, Abhishek Verma, Ada Gavrilovska, Karsten Schwan {merritt.alex,abhishek.verma}@gatech.edu {vishakha,ada,schwan}@cc.gtaech.edu
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control
More informationTasks. Task Implementation and management
Tasks Task Implementation and management Tasks Vocab Absolute time - real world time Relative time - time referenced to some event Interval - any slice of time characterized by start & end times Duration
More informationProtecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms
Consistent * Complete * Well Documented * Easy to Reuse * Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms Waqar Ali University of Kansas Lawrence, USA wali@ku.edu Heechul Yun University
More informationExam TI2720-C/TI2725-C Embedded Software
Exam TI2720-C/TI2725-C Embedded Software Wednesday April 16 2014 (18.30-21.30) Koen Langendoen In order to avoid misunderstanding on the syntactical correctness of code fragments in this examination, we
More informationIntroduction to Operating System. Dr. Aarti Singh Professor MMICT&BM MMU
Introduction to Operating System Dr. Aarti Singh Professor MMICT&BM MMU Contents Today's Topic: Introduction to Operating Systems We will learn 1. What is Operating System? 2. What OS does? 3. Structure
More informationOPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5
OPERATING SYSTEMS CS3502 Spring 2018 Processor Scheduling Chapter 5 Goals of Processor Scheduling Scheduling is the sharing of the CPU among the processes in the ready queue The critical activities are:
More informationEfficient Video Processing on Embedded GPU
Efficient Video Processing on Embedded GPU Tobias Kammacher Armin Weiss Matthias Frei Institute of Embedded Systems High Performance Multimedia Research Group Zurich University of Applied Sciences (ZHAW)
More informationProperties of Processes
CPU Scheduling Properties of Processes CPU I/O Burst Cycle Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution: CPU Scheduler Selects from among the processes that
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationNon-Blocking Write Protocol NBW:
Non-Blocking Write Protocol NBW: A Solution to a Real-Time Synchronization Problem By: Hermann Kopetz and Johannes Reisinger Presented By: Jonathan Labin March 8 th 2005 Classic Mutual Exclusion Scenario
More informationReal-Time Architectures 2003/2004. Resource Reservation. Description. Resource reservation. Reinder J. Bril
Real-Time Architectures 2003/2004 Resource reservation Reinder J. Bril 03-05-2004 1 Resource Reservation Description Example Application domains Some issues Concluding remark 2 Description Resource reservation
More informationAvoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems
Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems Ming Yang,, Tanya Amert, Joshua Bakita, James H. Anderson, F. Donelson Smith All image sources and references are provided
More informationSummary. Ø Introduction Ø Project Goals Ø A Real-time Data Streaming Framework Ø Evaluations Ø Conclusions
Summary Ø Introduction Ø Project Goals Ø A Real-time Data Streaming Framework Ø Evaluations Ø Conclusions Introduction Java 8 has introduced Streams and lambda expressions to support the efficient processing
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation
More informationLecture 2 Process Management
Lecture 2 Process Management Process Concept An operating system executes a variety of programs: Batch system jobs Time-shared systems user programs or tasks The terms job and process may be interchangeable
More informationReal-Time Support for GPU. GPU Management Heechul Yun
Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks
More informationGPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA
GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit
More informationIntegrating Concurrency Control and Energy Management in Device Drivers. Chenyang Lu
Integrating Concurrency Control and Energy Management in Device Drivers Chenyang Lu Overview Ø Concurrency Control: q Concurrency of I/O operations alone, not of threads in general q Synchronous vs. Asynchronous
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationCSCI-GA Operating Systems Lecture 3: Processes and Threads -Part 2 Scheduling Hubertus Franke
CSCI-GA.2250-001 Operating Systems Lecture 3: Processes and Threads -Part 2 Scheduling Hubertus Franke frankeh@cs.nyu.edu Processes Vs Threads The unit of dispatching is referred to as a thread or lightweight
More informationCMPS 111 Spring 2013 Prof. Scott A. Brandt Midterm Examination May 6, Name: ID:
CMPS 111 Spring 2013 Prof. Scott A. Brandt Midterm Examination May 6, 2013 Name: ID: This is a closed note, closed book exam. There are 23 multiple choice questions, 6 short answer questions. Plan your
More informationKernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control flow
Fundamental Optimizations (GTC 2010) Paulius Micikevicius NVIDIA Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control flow Optimization
More informationREAL-TIME OVERVIEW SO FAR MICHAEL ROITZSCH. talked about in-kernel building blocks:
Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW 2 SO FAR talked about in-kernel building blocks: threads memory IPC drivers
More informationInterrupts and Time. Real-Time Systems, Lecture 5. Martina Maggio 28 January Lund University, Department of Automatic Control
Interrupts and Time Real-Time Systems, Lecture 5 Martina Maggio 28 January 2016 Lund University, Department of Automatic Control Content [Real-Time Control System: Chapter 5] 1. Interrupts 2. Clock Interrupts
More informationCS 370 Operating Systems
NAME S.ID. # CS 370 Operating Systems Mid-term Example Instructions: The exam time is 50 minutes. CLOSED BOOK. 1. [24 pts] Multiple choice. Check one. a. Multiprogramming is: An executable program that
More informationUniprocessor Scheduling. Aim of Scheduling
Uniprocessor Scheduling Chapter 9 Aim of Scheduling Response time Throughput Processor efficiency Types of Scheduling Long-Term Scheduling Determines which programs are admitted to the system for processing
More informationUniprocessor Scheduling. Aim of Scheduling. Types of Scheduling. Long-Term Scheduling. Chapter 9. Response time Throughput Processor efficiency
Uniprocessor Scheduling Chapter 9 Aim of Scheduling Response time Throughput Processor efficiency Types of Scheduling Long-Term Scheduling Determines which programs are admitted to the system for processing
More informationOperating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst
Operating Systems CMPSCI 377 Spring 2017 Mark Corner University of Massachusetts Amherst Multilevel Feedback Queues (MLFQ) Multilevel feedback queues use past behavior to predict the future and assign
More informationStrong Scaling for Molecular Dynamics Applications
Strong Scaling for Molecular Dynamics Applications Scaling and Molecular Dynamics Strong Scaling How does solution time vary with the number of processors for a fixed problem size Classical Molecular Dynamics
More informationModeling Software with SystemC 3.0
Modeling Software with SystemC 3.0 Thorsten Grötker Synopsys, Inc. 6 th European SystemC Users Group Meeting Stresa, Italy, October 22, 2002 Agenda Roadmap Why Software Modeling? Today: What works and
More informationHybrid Implementation of 3D Kirchhoff Migration
Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs
More informationBoosting Quasi-Asynchronous I/Os (QASIOs)
Boosting Quasi-hronous s (QASIOs) Joint work with Daeho Jeong and Youngjae Lee Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem 2 Why?
More informationA Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria
A Predictable RTOS Mantis Cheng Department of Computer Science University of Victoria Outline I. Analysis of Timeliness Requirements II. Analysis of IO Requirements III. Time in Scheduling IV. IO in Scheduling
More informationCPU Scheduling Algorithms
CPU Scheduling Algorithms Notice: The slides for this lecture have been largely based on those accompanying the textbook Operating Systems Concepts with Java, by Silberschatz, Galvin, and Gagne (2007).
More informationReal-Time Operating Systems Issues. Realtime Scheduling in SunOS 5.0
Real-Time Operating Systems Issues Example of a real-time capable OS: Solaris. S. Khanna, M. Sebree, J.Zolnowsky. Realtime Scheduling in SunOS 5.0. USENIX - Winter 92. Problems with the design of general-purpose
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationOVERVIEW. Last Week: But if frequency of high priority task increases temporarily, system may encounter overload: Today: Slide 1. Slide 3.
OVERVIEW Last Week: Scheduling Algorithms Real-time systems Today: But if frequency of high priority task increases temporarily, system may encounter overload: Yet another real-time scheduling algorithm
More informationAntonio R. Miele Marco D. Santambrogio
Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First
More informationAn Overview of MIPS Multi-Threading. White Paper
Public Imagination Technologies An Overview of MIPS Multi-Threading White Paper Copyright Imagination Technologies Limited. All Rights Reserved. This document is Public. This publication contains proprietary
More informationInterrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller
Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller U. Brinkschulte, C. Krakowski J. Kreuzinger, Th. Ungerer Institute of Process Control,
More informationCUDA Programming Model
CUDA Xing Zeng, Dongyue Mou Introduction Example Pro & Contra Trend Introduction Example Pro & Contra Trend Introduction What is CUDA? - Compute Unified Device Architecture. - A powerful parallel programming
More informationUniprocessor Scheduling
Uniprocessor Scheduling Chapter 9 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall CPU- and I/O-bound processes
More informationMattan Erez. The University of Texas at Austin
EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 12 GPU Architecture (NVIDIA G80) Mattan Erez The University of Texas at Austin Outline 3D graphics recap and
More informationProcesses and Threads
Processes and Threads Giuseppe Anastasi g.anastasi@iet.unipi.it Pervasive Computing & Networking Lab. () Dept. of Information Engineering, University of Pisa Based on original slides by Silberschatz, Galvin
More informationSOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG
SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG 1 WHO ARE THOSE GUYS Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling
More informationTEGRA LINUX DRIVER PACKAGE R23.2
TEGRA LINUX DRIVER PACKAGE R23.2 RN_05071-R23 February 25, 2016 Advance Information Subject to Change Release Notes RN_05071-R23 TABLE OF CONTENTS 1.0 ABOUT THIS RELEASE... 3 1.1 What s New... 3 1.2 Login
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationCUDA Optimization: Memory Bandwidth Limited Kernels CUDA Webinar Tim C. Schroeder, HPC Developer Technology Engineer
CUDA Optimization: Memory Bandwidth Limited Kernels CUDA Webinar Tim C. Schroeder, HPC Developer Technology Engineer Outline We ll be focussing on optimizing global memory throughput on Fermi-class GPUs
More informationTHREADS ADMINISTRIVIA RECAP ALTERNATIVE 2 EXERCISES PAPER READING MICHAEL ROITZSCH 2
Department of Computer Science Institute for System Architecture, Operating Systems Group THREADS ADMINISTRIVIA MICHAEL ROITZSCH 2 EXERCISES due to date and room clashes we have to divert from our regular
More informationLecture 10. Efficient Host-Device Data Transfer
1 Lecture 10 Efficient Host-Device Data fer 2 Objective To learn the important concepts involved in copying (transferring) data between host and device System Interconnect Direct Memory Access Pinned memory
More information