CS A490 Digital Media and Interactive Systems
|
|
- Rosamund Young
- 5 years ago
- Views:
Transcription
1 CS A490 Digital Media and Interactive Systems Lecture 11 Thread Scaling and I/O Threading and Async I/O on Linux October 30, 2013 Sam Siewert
2 Parallel Processing Speed-up Grid Data Processing Speed-up 1. Multi-Core, Multi-threaded, Macro-blocks/Frames 2. SIMD, Vector Instructions Operating over Large Words (Many Times Instruction Set Size) 3. Co-Processor Operates in Parallel to CPU(s) SPMD GPU or GP-GPU Co-Processor PCI-Express Bus Interfaces Transfer Program and Data to Co-Processor Threads and Blocks to Transform Data Concurrently Image Data Processing Few Data Dependencies Good Speed-up by Amdahl s Law Max _ Speed P=Parallel Portion (1-P)=Sequential Portion S=# of Cores (Concurrency) Overhead for Co-Processor IO for Co-Processing _ Up Multicore _ Speed _ Up = S is infinite here 1 = (1 P) (1 P) + Sam Siewert 2 P / S
3 Amdahl s Law Infinite Cores Maximum Speed-up Driven by Sequential and Parallel Portions of Program P = Parallel Portion (1-P) = Sequential Portion Speed-up for Given Multi-core Architecture Function of # of Cores (Speed-up in Parallel Portions) All Code Parallel (Infinite Speed-up) 95% Parallel (20x Speed-up) Amdahl's Law Max Speed-up (Any Number of Processor Cores) Algorithm Speed Up E Sequential Portion (% Computation in Sequential vs. Parallel Execution) Max Speed-up No Parallel Portion All Sequential (No Speed-up) Sam Siewert 3
4 Multi-Core Speed-Up Amdahl's Law - Speed-up with # Cores and Parallel Portion % Parallel Program 14 Speed-up Max Speed-up 2 cores 4 cores 8 cores 12 cores 32 cores Sequential Portion of Algorithm Sam Siewert 4
5 Hiding IO Latency Overlapping with Processing Simple Design Each Thread has READ, PROCESS, WRITE-BACK Execution READ F(1) Process F(1) Write-back F(1) READ F(2) Frame rate is READ+PROCESS+WRITE latency e.g. 10 fps for 100 milliseconds If READ is 70 msec, PROCESS is 10 msec, and WRITE-BACK 20 msec, predominate time is IO time, not processing Disk drive with 100 MB/sec READ rate can only read 16 fps, 62.5 msec READ latency Sam Siewert 5
6 Hiding IO Latency Schedule Multiple Overlapping Threads? READ F 1 Process F 1 Write-back F 1 READ F 4 Process F 4 Write-back F 4 READ F 2 Process F 2 Write-back F 2 READ F 5 Process F 5 READ F 3 Process F 3 Write-back F 3 Read F 6 Start-up Core #1 Continuous Processing Core #1 Continuous Processing READ F 1 Process F 1 Write-back F 1 READ F 4 Process F 4 Write-back F 4 READ F 2 Process F 2 Write-back F 2 READ F 5 Process F 5 READ F 3 Process F 3 Write-back F 3 Read F 6 Start-up Core #2 Continuous Processing Core #2 Continuous Processing Requires N threads = N stages x N cores 1.5 to 2x Number of Threads for SMT (Hyper-threading) For IO Stage Duration Similar to Processing Time More Threads if IO Time (Read+WB+Read) >> 3 x Processing Time Sam Siewert 6
7 Hiding Latency Dedicated IO Schedule Reads Ahead of Processing Read F 1 Read F 2 Read F 3 Read F 4 Read F 5 Read F 6 Read F 7 Read F 8 Wait Process F 1 Process F 3 Process F 5 Wait Process F 2 Process F 4 Process F 6 Wait WB F 1 WB F 2 WB F 3 WB F 4 WB F 5 WB F 6 Start-up Dual-Core Concurrent Processing Completion Requires N threads = 2 + N cores Synchronize Frame Ready/Write-backs Balance Stage Read/Write-Back Latency to Processing 1.5 to 2x Threads for SMT (Hyper-threading) Sam Siewert 7
8 Processing Latency Alone Write Code with Memory Resident Frames Load Frames in Advance Process In-Memory Frames Over and Over Do No IO During Processing Provides Baseline Measurement of Processing Latency per Frame Alone Provides Method of Optimizing Processing Without IO Latency Sam Siewert 8
9 IO Latency Alone Comment Out Frame Transformation Code or Call Stubbed NULL Function Provides Measurement of IO Frame Rate Alone Essentially Zero Latency Transform No Change Between Input Frames and Output Frames Allows for Tuning of IO Scheduler and Threading Sam Siewert 9
10 Tips for IO Scheduling blockdev --getra /dev/sda Should return 256 Means that reads read-ahead up to 128K Function calls read, fread should request as much as possible Check actual bytes read, re-read as needed in a loop blockdev --setra /dev/sda (8MB) Switch CFQ to Deadline Use lsscsi to verify your disk is /dev/sda substitue block driver interface used for file system if not sda cat /sys/block/sda/queue/scheduler echo deadline > /sys/block/sda/queue/scheduler Options are noop, cfq, deadline Sam Siewert 10
S is infinite here 1 =
Lecture 12 ECEN 5653 CPU & IO Threading, Scaling, and Speed-up April 7, 2008 Sam Siewert Reminders Help Sessions E-mail siewerts@colorado.edu with ECEN5033 DEBUG in Subject Choose Meeting Date and Time
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation
More informationLecture 11 ECEN 5653
Lecture 11 ECEN 5653 Code Configuration Management and Version Control, User-Space Debug And Performance Optimizations April 17, 2012 Sam Siewert Overview NAB Show This Week - http://www.nabshow.com/ Viral
More informationCSE A215 Assembly Language Programming for Engineers
CSE A215 Assembly Language Programming for Engineers Lecture 13 Storage and I/O (MMIO, Devices, Reliability/Availability, Performance) 20 November 2012 Sam Siewert Hardware/Software Interface for I/O Basics
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationI/O Buffering and Streaming
I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks
More informationExploiting the full power of modern industry standard Linux-Systems with TSM Stephan Peinkofer
TSM Performance Tuning Exploiting the full power of modern industry standard Linux-Systems with TSM Stephan Peinkofer peinkofer@lrz.de Agenda Network Performance Disk-Cache Performance Tape Performance
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationAgenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2
Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationChapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348
Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationImproving Ceph Performance while Reducing Costs
Improving Ceph Performance while Reducing Costs Applications and Ecosystem Solutions Development Rick Stehno Santa Clara, CA 1 Flash Application Acceleration Three ways to accelerate application performance
More informationParallelism Marco Serafini
Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control
More informationIntroduction II. Overview
Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and
More informationOptimize Storage Performance with Red Hat Enterprise Linux
Optimize Storage Performance with Red Hat Enterprise Linux Mike Snitzer Senior Software Engineer, Red Hat 09.03.2009 2 Agenda Block I/O Schedulers Linux DM Multipath Readahead I/O
More informationLecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)
Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4) 1 Magnetic Disks A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationComputer and Machine Vision
Computer and Machine Vision Lecture Week 7 Part-1 (Convolution Transform Speed-up and Hough Linear Transform) February 26, 2014 Sam Siewert Outline of Week 7 Basic Convolution Transform Speed-Up Concepts
More informationHow to Write Fast Code , spring th Lecture, Mar. 31 st
How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying
More informationHigh Performance Computing Systems
High Performance Computing Systems Shared Memory Doug Shook Shared Memory Bottlenecks Trips to memory Cache coherence 2 Why Multicore? Shared memory systems used to be purely the domain of HPC... What
More informationComp 204: Computer Systems and Their Implementation. Lecture 18: Devices
Comp 204: Computer Systems and Their Implementation Lecture 18: Devices 1 Today Devices Introduction Handling I/O Device handling Buffering and caching 2 Operating System An Abstract View User Command
More informationCS5460/6460: Operating Systems. Lecture 24: Device drivers. Anton Burtsev April, 2014
CS5460/6460: Operating Systems Lecture 24: Device drivers Anton Burtsev April, 2014 Device drivers Conceptually Implement interface to hardware Expose some high-level interface to the kernel or applications
More informationChapter 18 - Multicore Computers
Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca
More informationParallel Computing. Hwansoo Han (SKKU)
Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationPerformance and Optimization Issues in Multicore Computing
Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationSystems Architecture II
Systems Architecture II Topics Interfacing I/O Devices to Memory, Processor, and Operating System * Memory-mapped IO and Interrupts in SPIM** *This lecture was derived from material in the text (Chapter
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationFundamental CUDA Optimization. NVIDIA Corporation
Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. CPU performance keeps increasing 26 72-core Xeon
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 2 Introduction Part 1 August 31, 2015 Sam Siewert So Why SW for HRT Systems? ASIC and FPGA State-Machine Solutions Offer Hardware Clocked Deterministic Solutions FPGAs
More informationSTORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo
STORAGE SYSTEMS Operating Systems 2015 Spring by Euiseong Seo Today s Topics HDDs (Hard Disk Drives) Disk scheduling policies Linux I/O schedulers Secondary Storage Anything that is outside of primary
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing
More informationCOT 4600 Operating Systems Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM
COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM Lecture 23 Attention: project phase 4 due Tuesday November 24 Final exam Thursday December 10 4-6:50
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 What is an Operating System? What is
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationGPU Fundamentals Jeff Larkin November 14, 2016
GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster
More informationProcesses, Threads and Processors
1 Processes, Threads and Processors Processes and Threads From Processes to Threads Don Porter Portions courtesy Emmett Witchel Hardware can execute N instruction streams at once Ø Uniprocessor, N==1 Ø
More informationFYS Data acquisition & control. Introduction. Spring 2018 Lecture #1. Reading: RWI (Real World Instrumentation) Chapter 1.
FYS3240-4240 Data acquisition & control Introduction Spring 2018 Lecture #1 Reading: RWI (Real World Instrumentation) Chapter 1. Bekkeng 14.01.2018 Topics Instrumentation: Data acquisition and control
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationFinal Lecture. A few minutes to wrap up and add some perspective
Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection
More informationChoosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc
Choosing Hardware and Operating Systems for MySQL Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc -2- We will speak about Choosing Hardware Choosing Operating
More informationSE300 SWE Practices. Lecture 10 Introduction to Event- Driven Architectures. Tuesday, March 17, Sam Siewert
SE300 SWE Practices Lecture 10 Introduction to Event- Driven Architectures Tuesday, March 17, 2015 Sam Siewert Copyright {c} 2014 by the McGraw-Hill Companies, Inc. All rights Reserved. Four Common Types
More informationCDA3101 Recitation Section 13
CDA3101 Recitation Section 13 Storage + Bus + Multicore and some exam tips Hard Disks Traditional disk performance is limited by the moving parts. Some disk terms Disk Performance Platters - the surfaces
More informationIBM POWER8 100 GigE Adapter Best Practices
Introduction IBM POWER8 100 GigE Adapter Best Practices With higher network speeds in new network adapters, achieving peak performance requires careful tuning of the adapters and workloads using them.
More informationQuiz for Chapter 6 Storage and Other I/O Topics 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] Give a concise answer to each of the following
More informationChapter 6. Parallel Processors from Client to Cloud Part 2 COMPUTER ORGANIZATION AND DESIGN. Homogeneous & Heterogeneous Multicore Architectures
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Part 2 Homogeneous & Heterogeneous Multicore Architectures Intel XEON 22nm
More informationLecture 15: I/O Devices & Drivers
CS 422/522 Design & Implementation of Operating Systems Lecture 15: I/O Devices & Drivers Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions
More informationDEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK
DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More informationStorage System COSC UCB
Storage System COSC4201 1 1999 UCB I/O and Disks Over the years much less attention was paid to I/O compared with CPU design. As frustrating as a CPU crash is, disk crash is a lot worse. Disks are mechanical
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 31: Computer Input/Output Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview for today Input and output are fundamental for
More informationBulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model
Bulk Synchronous and SPMD Programming The Bulk Synchronous Model CS315B Lecture 2 Prof. Aiken CS 315B Lecture 2 1 Prof. Aiken CS 315B Lecture 2 2 Bulk Synchronous Model The Machine A model An idealized
More information4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.
Chapter 4: CPU 4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.8 Control hazard 4.14 Concluding Rem marks Hazards Situations that
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 2 Introduction to Scheduling of RT Services Part 1 September 2, 2017 Sam Siewert So Why SW for HRT Systems? ASIC and FPGA State-Machine Solutions Offer Hardware Clocked
More informationInnodb Performance Optimization
Innodb Performance Optimization Most important practices Peter Zaitsev CEO Percona Technical Webinars December 20 th, 2017 1 About this Presentation Innodb Architecture and Performance Optimization 3h
More informationCS140 Operating Systems and Systems Programming Midterm Exam
CS140 Operating Systems and Systems Programming Midterm Exam October 28 th, 2002 (Total time = 50 minutes, Total Points = 50) Name: (please print) In recognition of and in the spirit of the Stanford University
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationI/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationCOT 4600 Operating Systems Fall 2009
COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM Lecture 5 1 Lecture 5 Last time: Project. Today: Names and the basic abstractions Storage Next Time
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 29: Computer Input/Output Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Announcements ECE Honors Exhibition Wednesday, April
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationCUDA OPTIMIZATIONS ISC 2011 Tutorial
CUDA OPTIMIZATIONS ISC 2011 Tutorial Tim C. Schroeder, NVIDIA Corporation Outline Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationIssues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance
More informationDevices. Today. Comp 104: Operating Systems Concepts. Operating System An Abstract View 05/01/2017. Devices. Devices
Comp 104: Operating Systems Concepts Devices Today Devices Introduction Handling I/O Device handling Buffering and caching 1 2 Operating System An Abstract View User Command Interface Processor Manager
More informationComputer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications
Computer-System Architecture (cont.) Symmetrically Constructed Clusters (cont.) Advantages: 1. Greater computational power by running applications concurrently on all computers in the cluster. Disadvantages:
More informationChapter 8. A Typical collection of I/O devices. Interrupts. Processor. Cache. Memory I/O bus. I/O controller I/O I/O. Main memory.
Chapter 8 1 A Typical collection of I/O devices Interrupts Cache I/O bus Main memory I/O controller I/O controller I/O controller Disk Disk Graphics output Network 2 1 Interfacing s and Peripherals I/O
More informationSudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Active thread Idle thread
Intra-Warp Compaction Techniques Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Goal Active thread Idle thread Compaction Compact threads in a warp to coalesce (and eliminate)
More informationComplexity and Advanced Algorithms. Introduction to Parallel Algorithms
Complexity and Advanced Algorithms Introduction to Parallel Algorithms Why Parallel Computing? Save time, resources, memory,... Who is using it? Academia Industry Government Individuals? Two practical
More informationProgramming at Scale: Concurrency
Programming at Scale: Concurrency 1 Goal: Building Fast, Scalable Software How do we speed up software? 2 What is scalability? A system is scalable if it can easily adapt to increased (or reduced) demand
More informationMicrosoft Windows HPC Server 2008 R2 for the Cluster Developer
50291B - Version: 1 02 May 2018 Microsoft Windows HPC Server 2008 R2 for the Cluster Developer Microsoft Windows HPC Server 2008 R2 for the Cluster Developer 50291B - Version: 1 5 days Course Description:
More informationParallelized Progressive Network Coding with Hardware Acceleration
Parallelized Progressive Network Coding with Hardware Acceleration Hassan Shojania, Baochun Li Department of Electrical and Computer Engineering University of Toronto Network coding Information is coded
More informationInput/Output Systems
CSE325 Principles of Operating Systems Input/Output Systems David P. Duggan dduggan@sandia.gov April 2, 2013 Input/Output Devices Output Device Input Device Processor 4/2/13 CSE325 - I/O Systems 2 Why
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor
More informationExample Networks on chip Freescale: MPC Telematics chip
Lecture 22: Interconnects & I/O Administration Take QUIZ 16 over P&H 6.6-10, 6.12-14 before 11:59pm Project: Cache Simulator, Due April 29, 2010 NEW OFFICE HOUR TIME: Tuesday 1-2, McKinley Exams in ACES
More informationA320 Supplemental Multi-Core Materials
A320 Supplemental Multi-Core Materials Scaling for Data-centric Computing (Overview for OS) April 18, 2013 Sam Siewert Scaling Processors and Processing Distributed Systems Networked Machines, Map Reduce
More informationCS A320 Operating Systems for Engineers
CS A320 Operating Systems for Engineers Lecture 8 Review Through MOS Chapter 4 and Material Up to EXAM #1 October 14, 2013 Sam Siewert History of OS and Abstraction History of Unix and Linux (Multics)
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationThe Journey of an I/O request through the Block Layer
The Journey of an I/O request through the Block Layer Suresh Jayaraman Linux Kernel Engineer SUSE Labs sjayaraman@suse.com Introduction Motivation Scope Common cases More emphasis on the Block layer Why
More informationGPU programming. Dr. Bernhard Kainz
GPU programming Dr. Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms Pitfalls and best practice Reduction and tiling
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationChapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance
Chapter 7 Multicores, Multiprocessors, and Clusters Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level)
More informationCSE 4/521 Introduction to Operating Systems. Lecture 12 Main Memory I (Background, Swapping) Summer 2018
CSE 4/521 Introduction to Operating Systems Lecture 12 Main Memory I (Background, Swapping) Summer 2018 Overview Objective: 1. To provide a detailed description of various ways of organizing memory hardware.
More informationIntegrating Concurrency Control and Energy Management in Device Drivers. Chenyang Lu
Integrating Concurrency Control and Energy Management in Device Drivers Chenyang Lu Overview Ø Concurrency Control: q Concurrency of I/O operations alone, not of threads in general q Synchronous vs. Asynchronous
More information