QUIZ Ch.6. The EAT for a two-level memory is given by:

QUIZ Ch.6 The EAT for a two-level memory is given by: EAT = H Access C + (1-H) Access MM. Derive a similar formula for three-level memory: L1, L2 and RAM. Hint: Instead of H, we now have H 1 and H 2. Source: http://stackoverflow.com/questions/4087280/approximate-cost-toaccess-various-caches-and-main-memory

QUIZ Ch.6 Use the formula for three-level memory to compute the EAT, assuming: H 1 = 0.93 H 2 = 0.97 Local DRAM f = 2.5 GHz Source: http://stackoverflow.com/questions/4087280/approximate-cost-toaccess-various-caches-and-main-memory

Chapter 7 Input/Output and Storage Systems

7.1 Introduction Data storage and retrieval is one of the primary functions of computer systems. One could easily make the argument that computers are more useful to us as data storage and retrieval devices than they are as computational machines. All computers have I/O devices connected to them, and to achieve good performance I/O should be kept to a minimum! In studying I/O, we seek to understand the different types of I/O devices as well as how they work. 4

7.2 I/O and Performance Sluggish I/O throughput can have a ripple effect, dragging down overall system performance. This is especially true when virtual memory is involved! The fastest processor in the world is of little use if it spends most of its time waiting for data. If we really understand what s happening in a computer system we can make the best possible use of its resources. 5

Remember the lesson of Caches (Ch.6) Not in text As the x86 microprocessors reached clock rates of 20 MHz and above in the Intel 386, small amounts of fast cache memory began to be featured in systems to improve performance (1985). This was because the DRAM used for main memory had significant latency, up to 120 ns, as well as refresh cycles. Source: http://en.wikipedia.org/wiki/cpu_cache#in_x86_microprocessors In other words, the memory became a bottleneck. We need a way to quantify the impact of various improvements on the overall performance of the computer system we re designing! 6

7.3 Amdahl s Law The overall performance of a system is a result of the interaction of all of its components. System performance is most effectively improved when the performance of the most heavily used components is improved. This idea is quantified by Amdahl s Law (1967): S is the overall speedup; f is the fraction of work performed by a faster component; k is the speedup of the faster component. 7

QUIZ: Amdahl s Law Assume the component we plan to upgrade is responsible for half (f = 0.5) of the running time of a program. Plug the following values for k in the formula: k = 1 (i.e. no speedup) k = 2 (i.e. the component works twice as fast) k =10 k = 10,000 What do we notice? 8

Silly QUIZ: Amdahl s Law Assume the component we plan to upgrade is responsible for half (f = 0.5) of the running time of a program. Plug the following values for k in the formula: k = 0.5 (i.e.???) k = 0.25 (i.e.???) k = 0.0 (i.e.???) What do we notice? 9

On a large system, suppose we can: upgrade a CPU to make it 50% faster for $10,000 or upgrade its disk drives for $7,000 to make them 150% faster. Processes spend: QUIZ: Amdahl s Law 70% of their time running in the CPU 30% of their time waiting for disk service. An upgrade of which component would offer the greater benefit for the lesser cost? (solve half-and-half) 10

QUIZ: Amdahl s Law The processor option offers a 30% speedup: And the disk drive option gives a 22% speedup: 11

QUIZ: Amdahl s Law The processor option offers a 30% speedup: And the disk drive option gives a 22% speedup: Problem: There are two variables: speedup and cost. How can we compare them meaningfully? 12

QUIZ: Amdahl s Law The processor option offers a 30% speedup: And the disk drive option gives a 22% speedup: Each 1% of improvement: for the processor costs $333 for the disk costs $318. 13

QUIZ: Amdahl s Law The processor option offers a 30% speedup: And the disk drive option gives a 22% speedup: Each 1% of improvement: for the processor costs $333 for the disk costs $318. 14 Should price/performance be our only concern?

7.4 I/O Architectures? I/O = subsystem of components that moves coded data between external devices and a host system (CPU + MM, most likely the motherboard). I/O examples: Blocks of MM devoted to I/O functions. Buses that move data into and out of the system. Control modules in the host and in peripheral devices Interfaces to external components such as keyboards and disks. Cabling or communications links between the host system 15 and its peripherals (USB, SATA, SCSI, RS232 etc.).

16 Generic I/O diagram

Generic I/O diagram Interfaces Protocols Handshake Buffers Physical conversion Logical conversion 17

I/O can be controlled in five general ways. Programmed (a.k.a. polled) I/O reserves a register for each I/O device. CPU continually polls these registers ( new data bit) to detect data arrival. Interrupt-Driven I/O allows the CPU to do other things until I/O is requested. Memory-Mapped I/O shares memory address space between I/O devices and program memory. Direct Memory Access (DMA) offloads I/O processing to a special-purpose chip that takes care of the details. Channel I/O uses dedicated I/O processors. 18 7.4 I/O Architectures

Interrupt-driven I/O Each device connects its interrupt request line to the interrupt controller. Controller signals the CPU by activating INT When ready, CPU responds by activating INTA Controller deactivates INT and starts transferring data through D 0 D 1 19

Interrupt-driven I/O What is several Input devices place requests at the same time? 20

Interrupt-driven I/O What is several Input devices place requests at the same time? A: The interrupt controller handles priorities. 21

Remember Section 4.9 MARIE Instruction Processing Interrupt processing involves adding another step to the fetch-decode-get-execute cycle: ISR = Interrupt Service Routine The starting addresses of all ISRs are stored in an Interrupt Vector Table 22

Interrupt-driven I/O The status of the interrupt signal is checked at the top of the fetch-decode-get-execute cycle. The particular code that is executed whenever an interrupt occurs is determined by a set of addresses called interrupt vectors that are stored in low memory. The system state is saved before the interrupt service routine is executed and is restored afterward. 23

24 Interrupt-driven I/O

Memory-mapped I/O The I/O devices and the MM share the same address space. Each I/O device has its own reserved block of memory. To the CPU, memory-mapped I/O looks just like a regular MM access Thus the same instructions to move data to and from both I/O and MM, greatly simplifying system design. In small systems the low-level details of the data transfers are offloaded to the I/O controllers built into the I/O devices. 25

DMA DMA takes CPU out of the process of transferring data between MM and I/O devices but DMA and CPU still share the data bus! DMA has higher priority and steals memory cycles from CPU. 26 Why? I/O operations have to beware of timeouts.

DMA DMA transfers can either occur one byte at a time or all at once in burst mode. If they occur a byte at a time, this can allow the CPU to access memory on alternate bus cycles this is called cycle stealing since the DMA controller and CPU contend for memory access. In burst mode DMA, the CPU can be put on hold while the DMA transfer occurs and a full block of possibly hundreds or thousands of bytes can be moved. 27

DMA DMA is used by disk drive controllers, graphics cards, network cards, sound cards, etc. DMA is also used for transferring data between cores on the same CPU chip. 28

Channel I/O Used in large systems (mainframes, servers), where the many I/O requests would slow down CPU performance significantly. One or more I/O processors (IOP) control various channel paths. Slower devices such as terminals and printers are combined (multiplexed) into a single faster channel. Unlike DMA, there is a separate I/O bus! 29

30 Channel I/O configuration

Channel I/O Distinguished from DMA by the intelligence of IOPs: The IOP negotiates protocols, issues device commands, translates storage coding to memory coding, and can transfer entire files or groups of files independent of the host CPU. The host has only to create the program instructions for the I/O operation and tell the IOP where to find them. Example: Backup a large file from HDD to tape. 31

Character I/O vs. block I/O Character I/O devices process one byte (or character) at a time. Examples include modems, keyboards, and mice. Keyboards are usually connected through an interruptdriven I/O system. Block I/O devices handle bytes in groups. Most mass storage devices (disk and tape) are block I/O devices. Block I/O systems are most efficiently connected through DMA or channel I/O. 32

To do for Ch.7 review: Read text sections 7.1, 7.2, 7.3, 7.4 (stop before 7.4.3 I/O bus operation) Read What Do We Really Mean by Speedup? pp.403-5 and understand the 3 examples given there. Answer Review Questions 1-14 Solve in notebook Exercises 1 and 4 33