Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang

Similar documents
Lecture 21: Virtual Memory. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Parallel Computing Platforms

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

2 MARKS Q&A 1 KNREDDY UNIT-I

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

CS330: Operating System and Lab. (Spring 2006) I/O Systems

Comp. Org II, Spring

Parallel Processing & Multicore computers

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Online Course Evaluation. What we will do in the last week?

Comp. Org II, Spring

Lecture 24: Virtual Memory, Multiprocessors

CDA3101 Recitation Section 13

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Input/Output Systems

5.b Principles of I/O Hardware CPU-I/O communication

5 Computer Organization

Introduction II. Overview

Systems Architecture II

Introduction to parallel computing

RAID 0 (non-redundant) RAID Types 4/25/2011

CSC2/458 Parallel and Distributed Systems Machines and Models

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Parallel Architectures

Chapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348

Computer Organization. Chapter 16

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture 24: Memory, VM, Multiproc

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

6.9. Communicating to the Outside World: Cluster Networking

Parallel Computing: Parallel Architectures Jin, Hai

Computer Architecture and Organization

Computer Systems Architecture

Chapter 11. Introduction to Multiprocessors

5 Computer Organization

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

COSC 6385 Computer Architecture - Multi Processor Systems

WHY PARALLEL PROCESSING? (CE-401)

Computer parallelism Flynn s categories

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

Computer Systems Architecture

Organisasi Sistem Komputer

Programmation Concurrente (SE205)

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

GPU programming. Dr. Bernhard Kainz

Vector Processors and Graphics Processing Units (GPUs)

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Multiprocessors - Flynn s Taxonomy (1966)

Computer Architecture: SIMD and GPUs (Part I) Prof. Onur Mutlu Carnegie Mellon University

CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt

Processor Architecture and Interconnect

ECE 341. Lecture # 19

Today: Computer System Overview (Stallings, chapter ) Next: Operating System Overview (Stallings, chapter ,

COMP 322: Fundamentals of Parallel Programming. Flynn s Taxonomy for Parallel Computers

Lect. 2: Types of Parallelism

CS6303-COMPUTER ARCHITECTURE UNIT I OVERVIEW AND INSTRUCTIONS PART A

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

CS 134. Operating Systems. April 8, 2013 Lecture 20. Input/Output. Instructor: Neil Rhodes. Monday, April 7, 14

Processor Performance and Parallelism Y. K. Malaiya

CSE 392/CS 378: High-performance Computing - Principles and Practice

Architectures of Flynn s taxonomy -- A Comparison of Methods

Chapter 17 - Parallel Processing

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Multi-Processors and GPU

Design of Digital Circuits Lecture 21: GPUs. Prof. Onur Mutlu ETH Zurich Spring May 2017

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Chapter 12: I/O Systems

Chapter 13: I/O Systems

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

Dr. Joe Zhang PDC-3: Parallel Platforms

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 09, SPRING 2013

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Multi-core Architectures. Dr. Yingwu Zhu

by I.-C. Lin, Dept. CS, NCTU. Textbook: Operating System Concepts 8ed CHAPTER 13: I/O SYSTEMS

Fundamentals of Quantitative Design and Analysis

Introduction to Computer Science Lecture 2: Data Manipulation

CS370 Operating Systems

Lecture 7: Parallel Processing

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

CS 61C: Great Ideas in Computer Architecture. The Flynn Taxonomy, Intel SIMD Instructions

Chapter 4 Threads, SMP, and

Unit 1. Chapter 3 Top Level View of Computer Function and Interconnection

CSCI 4717 Computer Architecture

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory

KeyStone II. CorePac Overview

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Issues in Multiprocessors

The control of I/O devices is a major concern for OS designers

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Transcription:

Lecture 25: Interrupt Handling and Multi-Data Processing Spring 2018 Jason Tang 1

Topics Interrupt handling Vector processing Multi-data processing 2

I/O Communication Software needs to know when: I/O device has complete an operation I/O device had an error Software can either: Poll device (programmed I/O) Wait for I/O interrupt Operating System Concepts, Silberschatz, Galvin, and Gagne 3

Data Transfers Advantage Disadvantage Programmed I/O Simple to implement, processor in complete control CPU is stalled while waiting for I/O response Interrupt Driven Software is halted only during actual transfer Device must raise interrupt, processor must detect and handle interrupt Programmed I/O is best for frequent, small data transfers Interrupt driven is best when transfers are infrequent, and when a dedicated DMA engine handles transfers of large blocks of data Use PIO when the amount of data to transfer is less than overhead of creating a TxD 4

Interrupt Handling Hardware sends an electrical signal on a physical interrupt line Processor detects that signal and translates it into an interrupt request (IRQ) number Processor then jumps to kernel s interrupt handling code Kernel searches through its interrupt request table (stored in RAM) for entry or entries that match the IRQ If found, kernel jumps to the interrupt service routine (ISR) that was registered If not found, kernel ignores IRQ 5

Linux Kernel IRQ Handling On x86, IRQ table stored in vector_irq[] array Hardware PIC On x86, see /arch/ x86/kernel/irq.c Linux Kernel Development, Love 6

Interrupt Controller Hardware component that collects signals from I/O devices Contains multiple enable registers, one per interrupt source When an interrupt is enabled, will forward interrupt to processor When interrupt is disabled, that interrupt source is masked Can also prioritize output, when multiple devices raise interrupts ASB Example AMBA SYstem Technical Reference Manual, 4.1.2 7

8259 PIC Original programmable interrupt controller for Intel-based desktops Has 8 inputs, organized by priority When an unmasked input is raised and an no other interrupt is pending, then PIC raises interrupt line to CPU Superseded by Advanced Programmable Interrupt Controller (APIC) http://www.thesatya.com/8259.html 8

Message Signaled Interrupts Newer alternative to line-based interrupts Instead of having dedicated wires to trigger interrupts, a device triggers interrupt by writing to a special memory address Number of interrupts no longer constrained by size of PIC Operating system does not need to poll devices to determine source of interrupt, when multiple devices are on a shared interrupt line https://docs.microsoft.com/en-us/windows-hardware/ drivers/kernel/using-message-signaled-interrupts 9

Parallel Processing Processor Processor Processor MMU MMU MMU Snoop Tag Cache Tag and Data Snoop Tag Cache Tag and Data Snoop Tag Cache Tag and Data Memory-I/O Bus DRAM Disk Controller Network Interface I/O Bridge Modern computers are multiprocessors, to simultaneously execute multiple programs 10

Multiprocessors Device iphone 4 A4 A6 A8 A10 Fusion A11Bionic iphone 5 / 5c iphone 6 / 6+ CPU Cortex-A8 Swift Typhoon Multicore systems common in modern computers iphone 7 / 7+ Hurricane / Zephyr iphone 8 / 8+ / X Monsoon / Mistral Cores 1 2 2 2 / 2 2 / 4 Improvement in throughput by adding more cores is limited by Amdahl s Law Many modern software programs are written to take advantage of multiple processors 11

Parallel Processing Many scientific and engineering problems involve looping over an array, to perform some computation for (i = 0; i < 64; i++) a[i] = b[i] + s; // x0 = s, x1 = i, // x3 = a, x4 = b top: ldr w2, [x1, x3] add w2, w2, w0 str w2, [x1, x4] add x1, x1, 4 cmp x1, #64 b.ne top Repeatedly fetching the same instruction wastes a lot of cycles More efficient to have processor automatically perform operation across a vector of data 12

Flynn s Taxonomy Data Streams Instruction Streams One Many One SISD MISD Many SIMD MIMD Classification of computer architectures, based upon how the processor(s) handle data Instruction Stream: number of processing unit(s), executing instruction(s) Data Stream: number of data value(s) that the processing unit(s) are acting upon Traditional single core system is SISD https://www.geeksforgeeks.org/ computer-architecture-flynns-taxonomy/ 13

SIMD Single instruction that operates on multiple data, either stored in registers or in memory Common on modern systems, to perform vector arithmetic: x86/64: MMX, SSE, SSE2, SSE3, SSSE3 PowerPC: AltiVec ARM: NEON Fewer instructions to fetch, but requires more hardware 14

SIMD Subtypes True Vector Architecture: instruction specifies starting source and destination memory addresses, and how many times to execute the instruction Pipelined processor still executes only one calculation per cycle Only one instruction fetch, but multiple cycles of execution, memory accesses, and write backs Short-Vector Architecture: execute a single instruction across a few registers, treating each register as containing multiple independent data Example: ARM s NEON has 32 SIMD 128-bit registers, which can be treated as 2x 64-bit, 4x 32-bit, 8x 16-bit, or 16x 8-bit integers (signed or unsigned), or as 2x 64-bit or 4x 32-bit floating point values ARM Cortex-A Series Programmer s Guide, 7.2 15

ARM NEON registers Extension to ARMv7 and ARMv8, intended to accelerate common audio and video processing Performs the same operation in all lanes of a vector https://developer.arm.com/technologies/neon 16

ARMv8-A NEON Example /* add an array of floating point pairs */ void add_float_neon2(float *dst, float *src1, float *src2, int count); add_float_neon2: ld1 {v0.4s}, [x1], #16 ld1 {v1.4s}, [x2], #16 fadd v0.4s, v0.4s, v1.4s subs x3, x3, #4 st1 {v0.4s}, [x0], #16 bgt add_float_neon2 ret ld1 loads data into a SIMD register, st1 stores data from a SIMD register.4s suffix means the 4 single-precision floating registers fadd performs a vector floating-point add https://community.arm.com/android-community/b/ android/posts/arm-neon-programming-quick-reference 17

MISD Multiprocessor machine that where every processing unit uses the same dataset Built for fault tolerance systems Example: space shuttle flight computer Otherwise, very rare 18

MIMD Multiprocessor machine, executing different instructions on different data independently Most modern systems are some type of MIMD Subtypes based upon memory model: Centralized shared memory Distributed shared memory 19

Centralized Shared Memory MIMD Processor Processor Processor Processor Cache(s) Cache(s) Cache(s) Cache(s) Memory-I/O Bus DRAM I/O Uniform Memory Access (UMA): Processors share a single centralized memory through a bus interconnect Feasible when only few processors, with limited memory contention 20

Distributed Memory MIMD Processor + Cache Processor + Cache Processor + Cache Memory I/O Memory I/O Memory I/O Interconnection Network Memory I/O Memory I/O Memory I/O Processor + Cache Processor + Cache Processor + Cache Physically distribute memory, to avoid memory contention given large processor count Processor nodes can have some I/O (clustering) Increased communicating data is major disadvantage 21