Las time: Large Pages Last time: VAX translation. Today: I/O Devices

Similar documents
Systems Programming and Computer Architecture ( ) Timothy Roscoe

Systems Programming and Computer Architecture ( ) Timothy Roscoe

P6/Linux Memory System Nov 11, 2009"

Virtual Memory. Samira Khan Apr 27, 2017

Pentium/Linux Memory System March 17, 2005

Memory System Case Studies Oct. 13, 2008

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

Virtual Memory Oct. 29, 2002

Virtual Memory Nov 9, 2009"

CSE 351. Virtual Memory

CSE 153 Design of Operating Systems

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Virtual Memory II CSE 351 Spring

Virtual Memory: Systems

Motivations for Virtual Memory Virtual Memory Oct. 29, Why VM Works? Motivation #1: DRAM a Cache for Disk

Computer Systems. Virtual Memory. Han, Hwansoo

RESET CLK RDn WRn CS0 CS1 CS2n DIN[7:0] CTSn DSRn DCDn RXDATA Rin A[2:0] DO[7:0] TxDATA DDIS RTSn DTRn OUT1n OUT2n BAUDOUTn TXRDYn RXRDYn INTRPT

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

CISC 360. Virtual Memory Dec. 4, 2008

UART Register Set. UART Master Controller. Tx FSM. Rx FSM XMIT FIFO RCVR. i_rx_clk o_intr. o_out1 o_txrdy_n. o_out2 o_rxdy_n i_cs0 i_cs1 i_ads_n

Chapter 5: Memory Management

CS 153 Design of Operating Systems

Virtual Memory. Alan L. Cox Some slides adapted from CMU slides

Input/Output Problems. External Devices. Input/Output Module. I/O Steps. I/O Module Function Computer Architecture

OS Extensibility: Spin, Exo-kernel and L4

Virtual Memory: Concepts

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

=0 Read/Write IER Interrupt Enable Register =1 Read/Write - Divisor Latch High Byte + 2

Lecture 19: Virtual Memory: Concepts

Processes and Virtual Memory Concepts

CS 153 Design of Operating Systems

Generic Model of I/O Module Interface to CPU and Memory Interface to one or more peripherals

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

Foundations of Computer Systems

Virtual Memory: Concepts

Spring 2017 :: CSE 506. Device Programming. Nima Honarmand

Interconnecting Components

Virtual Memory. Computer Systems Principles

I/O Management Intro. Chapter 5

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Problem 9. VM address translation. (9 points): The following problem concerns the way virtual addresses are translated into physical addresses.

Configurable UART with FIFO ver 2.20

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Random-Access Memory (RAM) Systemprogrammering 2009 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics! The memory hierarchy

Last class: Today: Course administration OS definition, some history. Background on Computer Architecture

This lecture. Virtual Memory. Virtual memory (VM) CS Instructor: Sanjeev Se(a

I/O Devices. I/O Management Intro. Sample Data Rates. I/O Device Handling. Categories of I/O Devices (by usage)

Computer Organization ECE514. Chapter 5 Input/Output (9hrs)

CIS Operating Systems I/O Systems & Secondary Storage. Professor Qiang Zeng Fall 2017

Total Impact briq. Hardware Reference. 31st July Revision Overview 2

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

The control of I/O devices is a major concern for OS designers

CS 153 Design of Operating Systems Winter 2016

A Few Problems with Physical Addressing. Virtual Memory Process Abstraction, Part 2: Private Address Space

Digital System Design

a16450 Features General Description Universal Asynchronous Receiver/Transmitter

Organisasi Sistem Komputer

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

Page Which had internal designation P5

Configurable UART ver 2.10

Question F5: Caching [10 pts]

Computer Structure. X86 Virtual Memory and TLB

Virtual Memory III /18-243: Introduc3on to Computer Systems 17 th Lecture, 23 March Instructors: Bill Nace and Gregory Kesden

Computer Architecture. Lecture 8: Virtual Memory

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

ADRIAN PERRIG & TORSTEN HOEFLER Networks and Operating Systems ( ) Chapter 6: Demand Paging

Serial Interfacing. Pulse width of 1 bit

CIS Operating Systems I/O Systems & Secondary Storage. Professor Qiang Zeng Spring 2018

Advanced Operating Systems (CS 202) Scheduling (1) Jan, 23, 2017

PC Interrupt Structure and 8259 DMA Controllers

Review: Hardware user/kernel boundary

Virtual Memory: Systems

This training session will cover the Translation Look aside Buffer or TLB

Virtual Memory: Concepts

CSE 120 Principles of Operating Systems Spring 2017

Computer Architecture Lecture 13: Virtual Memory II

Device I/O Programming

Lecture 13 Input/Output (I/O) Systems (chapter 13)

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

Am186ER/Am188ER AMD continues 16-bit innovation

CS 318 Principles of Operating Systems

A VHDL UART core

CSE 120 Principles of Operating Systems

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

University of Washington Virtual memory (VM)

Chapter 6: Demand Paging

Lab 2: 80x86 Interrupts

CS 134. Operating Systems. April 8, 2013 Lecture 20. Input/Output. Instructor: Neil Rhodes. Monday, April 7, 14

VIRTUAL MEMORY II. Jo, Heeseung

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

I/O. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 6.5-6

Address spaces and memory management

2.5 Address Space. The IBM 6x86 CPU can directly address 64 KBytes of I/O space and 4 GBytes of physical memory (Figure 2-24).

Guerrilla 7: Virtual Memory, ECC, IO

Signal types (some of them) Networks and Operating Systems ( ) Chapter 5: Memory Management. Where do signals come from?

Transcription:

Last time: P6 (ia32) Address Translation CPU 32 result L2 and DRAM Lecture 20: Devices and I/O Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 20 12 virtual address (VA) VPN VPO 10 VPN1 16 TLBT TLB miss 10 VPN2 4 TLBI PDE... TLB (16 sets, 4 entries/set) PTE TLB hit L1 hit 20 12 PPN PPO physical address (PA) L1 (128 sets, 4 lines/set)... L1 miss 20 7 5 CT CI CO Systems Group Department of Computer Science ETH Zürich PDBR Page tables Last time: VAX translation 0x 21 9 Addr VPN Addr VPO 10 PxTB Addr VPN 00 10 PTE VPN PTE VPO (in seg. Px: user space) (in seg. S0: system space) Las time: Large Pages 10 22 20 12 VPN VPO VPN VPO versus 10 22 20 12 PPN PPO PPN PPO 10 SPTB PTE VPN 00 Load 10 PTE PFN PTE VPO Load 10 Addr PFN Addr dr PFO Load Data mapping the PTE we want we want - -bit Reduces compulsory TLB misses How to use (Linux) hugetlbfs support (since at least 2.6.16) Use libhugetlbs {m,c,re}alloc replacements -matrix multiply: c a Solution: O(B 2 3 ) operations * each row used O(B) times but every time O(B 2 ) ops between b + c Today: I/O Devices

What is a device? Occupies some location on a bus registers interrupts Today: I/O Devices Registers Read input data CPU can store to device registers: Write output data Reset states Example: National Semiconductor ns16550 UART Universal Asynchronous Receiver/Transmitter RS-232 serial line Chip now integrated into Southbridge on all PCs Rarely seen on laptops any more First device driver OS Registers given in chip datasheets or data trusted by OS programmers From the data ns16550 UART Addressing registers 1. Registers appear as memory locations Access using loads/stores (movb/movw/movl/movq) 2. I/O instructions : Special instructions: inb, outb, etc. 3. Indirection: with the actual register value Used to save scarce address space (usually I/O space)

ns16550 LSR: Line Status Register Addr. Name Description Notes 0 RBR (read only) 0 THR Transmit Holding Register (write only) 1 IER Interrupt Enable Register 2 IIR 2 FCR FIFO Control Register (write only) 3 LCR Line Control Register 4 Control Register 5 LSR Line Status Register 6 7 SCR Scratch Register 0 DLL Divisor Latch (LSB) 1 DLAB = bit 7 of the LCR register 0 7 DR OE PE FE BI THRE TEMT ERRF Error in Receiver FIFO Transmitter Empty Transmit Holding Register Empty Framing Error Parity Error Overrun Error Data Ready Very simple UART driver Very simple UART driver #define UART_BASE 0x3f8 #define UART_THR (UART_BASE + 0) #define UART_RBR (UART_BASE + 0) #define UART_LSR (UART_BASE + 5) void serial_putc(char c) { // Wait until FIFO can hold more chars while( (inb(uart_lsr) & 0x20)== 0); // Write character to FIFO outb(uart_thr, c); } char serial_getc() { // Wait until there is a char to read while( (inb(uart_lsr) & 0x01) == 0); // Read from the receive FIFO return inb(uart_rbr); } Register addresses Send a character (wait Read a character (spin waiting until one is there to read) Uses Programmed I/O (PIO) All data must pass through CPU registers Uses polling Can t do anything else in the meantime Although could be extended with threads and care Without CPU polling, no I/O can occur Registers are not memory Device registers don t behave like RAM! Status words Incoming data Writes to registers are used to trigger actions Sending data Resetting state machines Dealing with caches Register value changes cache becomes inconsistent Write- Reads and writes cannot be combined into cache lines Line- Even spurious reads trigger device state changes Device register access must bypass the cache PTEs I/O space access isn t cached anyway

Other challenges 1. How to avoid polling all the time? 2. How to avoid the CPU copying all the data? caches? Can the CPU get on with something else? 3. How are the physical addresses allocated? Today: I/O Devices Avoiding polling Solution: device can interrupt the processor address vector Interrupts CPU Interrupt-request line triggered by I/O device - or level-triggered Interrupt handler receives interrupts Maskable to ignore or delay some interrupts Interrupt vector to dispatch interrupt to correct handler Based on priority Some nonmaskable 0 Divide error 1 Debug exception 2 Non-maskable interrupt (NMI) 3 4 5 Bound 6 Invalid opcode 7 Coprocessor not available 9 Coprocessor A B Segment not present C D E Page Fault F Reserved 10 11 12 13 14-1F Reserved to Intel 20-FF Available for external interrupts Programmable Interrupt Controllers NMI INTR INTA D0 D7 PIC IRQ0 IRQ1 IRQ<n-1> Device 1 Device 2 Device n

Today: I/O Devices Avoid programmed I/O DMA controller Generally built-in these days device and memory Can save memory bandwidth Old- 5. interrupts CPU to complete 4. byte to address A, increments A, decrements S. controller PCI bus CPU Cache Frontside (memory) bus controller 1. controller 2. 3. Decoupling Doesn t pollute CPU cache Can be processed when the CPU (or OS) decides Higher- Possible disadvantages: Generally not a problem (even the UARTs inconsistent with CPU caches Options: 1. -cacheable large hit probably wants to process data anyway 2. 3. physical Appear on external bus User and OS code deal with virtual addresses (mostly) OS (and device drivers) must manually translate virtual not be contiguous in physical address space Scatter-gather

Today: I/O Devices Where are all the registers? Where are the device registers in the physical address space? And which interrupt vectors correspond to them? Solution: built into the I/O bus design Example: PCI PCI is Peripheral Component Interconnect -X, PCI-Express, etc. devices - PCIe has succeeded PCI, but extends the same software-visible interface PCI tries to solve many problems: Device discovery Finding out which devices are in the system Address allocation Which addresses should each device s registers appear at? Interrupt routing which exception vectors? Physical connections: PCI is a tree NIC Sound USB CPU PCI-PCI bridge PCI root bridge Wireless SCSI Graphics Also: PCI-ISA PCI-USB PCI-SCSI Etc. PCI address space is flat Physical address space (32-bit or 64-bit) I/O address space (usually 16-bit) Bridges up the tree remap addresses to the device Result: In memory space

-describing Accessed through parent bridge, initially Plus: Bits Description 16 ID ( Intel, 3Com, NVidia, etc. 16 24 Interrupts Etc. Finding all the devices Find the PCI root complex bridge Large PCs can have more than one recurse Result is: Allocating addresses bridge s range Each bridge has a range which includes all it s children. Each range is aligned to some power- -2 boundary Then program: Each device with base-address/range (BAR) registers Four interrupt lines PCI Interrupts INTA, INTB, INTC, INTD Translated by root bridge into system interrupt PCI Express introduces -signalled interrupts Translated by root bridge into system interrupt Interrupts can be individually steered to particular cores/apics PCI allows Bus Mastering Device can issue read/write transactions to anywhere in memory Even (in some cases) other PCI devices Principle applies: device Today: I/O Devices

Intelligent devices Example: Intel e1000 PCI- Express Ethernet card Devices can now autonomously access any: Other devices CPU Device interaction Extensive in- in hardware etc. Summary Devices and the CPU communicate via: Interrupts and interrupt vectors I/O Interconnects (such as PCI): Allow devices to share/allocate physical addresses Allocate interrupts Permit bus mastering direct memory access