BWM CRM, PRM CRM, WB, PRM BRM, WB CWM, CWH, PWM. Exclusive (read/write) CWM PWM

Size: px
Start display at page:

Download "BWM CRM, PRM CRM, WB, PRM BRM, WB CWM, CWH, PWM. Exclusive (read/write) CWM PWM"

Transcription

1 University of California, Berkeley College of Engineering Computer Science Division EECS Spring 998 D.A. Patterson Quiz 2 Solutions CS252 Graduate Computer Architecture

2 Question : Bigger, Better, Faster? A computer system has the following characteristics: æ Uses 0GB disks that rotate at 0000 RPM, have a data transfer rate of 0 MByteès èfor each diskè, and have a 8 ms seek time æ Has an average IèO size of 32 KByte æ Is limited only by the disks æ Has a total of 20 disks Each disk can handle only one request at a time, but each disk in the system can be handling a diæerent request. The data is not striped èall IèO for each request has to go to one diskè. aè What is the average service time for a request? service time = seek time + rotational latency + transfer time seek time = 8ms min 60 sec rotational latency = æ 0000 rotations min æ rotation = 3 ms 2 sec transfer time = 0 æ 2 20 æ 32kBytes = 3:25 ms Bytes service time = 8ms + 3ms + 3:25 ms = 4:25 ms bè Given the average IèO size from above and a random distribution of disk locations, what is the maximum number of IèOs per second èiopsè for the system? IOPS = service time =.0425 sec = 7 So, a single disk can support 7 IOPS. Therefore, the overall IOPS = 20 æ 7 = 420 IOPS. Someone suggests improving the system by using new, better disks. For the same total price as the original disks, you can get disks that have 9GBeach, rotate at 2000 RPM, transfer at 2 MBès, and have a 6 ms seek time. cè What would be the average service time for a request in the new system? service time = seek time + rotational latency + transfer time seek time = 6ms min 60 sec rotational latency = æ 2000 rotations min æ rotation = 2:5 ms 2 sec transfer time = 2 æ 2 20 æ 32 kbytes = 2:60 ms Bytes service time = 6ms + 2:5 ms + 2:60 ms = :0 ms 2

3 Question ècontinuedè dè What is the maximum number of IOPS in the new system? IOPS = service time = :00 sec = 90 So, a single disk can support 90 IOPS. Therefore, the overall IOPS = æ 90 = 990 IOPS. eè Treat the entire system as a MèMèm queue èthat is, a system with m servers rather than oneè, where each disk is a server. All requests are in a single queue. Requests may not overlap. Assume both systems receive an average of 950 IèO requests per second. Assume that any disk can service any request. What is the mean response time of the old system? The new one? You might ænd the following equation for an MèMèm queue useful: Old system: Server utilization = Arrival rate = Arrival rate æ Time server m Time server ç =m ç Time system = Time server æ + Server utilization mè, Server utilizationè 950 æ :0425 utilization = = : ç ç :6709 Ts = :0425 æ + 20 æ è, 0:6709è =5:56 ms New system: 950 æ :00 utilization = = :9586 ç ç :9586 Ts = :00 æ + æ è, 0:9586è =34:47 ms fè Which system has a lower average response time? Why? The system with 20 disks has a lower average response time. Even though each disk has worse performance, the larger number of disks means that the old system is capable of more IOPS, and hence has a lower utilization. Thus, the waiting time is much lower than on the new system. 3

4 Question 2: A MESI Situation Figure below shows the three-phase write-back cache coherence protocol from the book. CRH BWM Invalid CRM, PRM Shared (read only) BWM, WB CWM, PWM CRM, WB, PRM BRM, WB CWM, CWH, PWM CRM PRM CWH CRH Exclusive (read/write) CWM PWM The following terminology is used: Figure : Three-Phase Protocol CPU stimulus causing transition Operation on bus causing transition CPU action on bus Label CRH CRM CWH CWM BRM BWM PRM PWM WB Stimulus or action CPU read hit CPU read miss CPU write hit CPU write miss read miss for this block write miss for this block place CPU read miss on bus place CPU write miss on bus write back cache block 4

5 Question 2 ècontinuedè Figure 2 below shows a write-back MESI èmodiæed, Exclusive, Shared, Invalidè protocol. Assume that the processor is able to detect whether a read miss is a shared read miss or an exclusive read miss. BWM CRMs PRM Invalid Read Only (Shared) CRH CRH CWH BWM, WB Read/Write (dirty exclusive, or Modified) CRMx, PRM CRMs, PRM CWM, PWM CRMs, PRM, WB BRM, WB BWM CRMx, PRM, WB CWH CWM, PWM CWH, CWM PWM BRM CRMx, PRM CRMs, PRM Read Only (unshared, or clean Exclusive) CRH CWM PWM CRMx PRM The following terminology is used: Figure 2: MESI Protocol CPU stimulus causing transition Operation on bus causing transition CPU action on bus Label CRH CRMs CRMx CWH CWM BRM BWM PRM PWM WB Stimulus or Action CPU read hit CPU read miss èsharedè CPU read miss èexclusiveè CPU write hit CPU write miss read miss for this block write miss for this block place CPU read miss on bus place CPU write miss on bus write back cache block 5

6 Question 2 ècontinuedè Here is a sequence of memory accesses. Assume only 2 processors, with the value 5 stored in address A. All cache locations start out in the invalid state. æ P reads A æ P writes 0 to A æ P2 reads A æ P2 writes 5 to A Below are the actions that occur for the above sequence on a group of machines using the three-phase protocol. Mark in the table any of the actions that change when the machines use the MESI protocol. Only show the items that change. Extra blank lines have been provided for you to show your changes. There may be more blank lines than you need. ëread" for bus state means that a processor is reading the value that is on the bus. In the table below, a bus action in one line aæects processors and memory in the next line. For bus actions, denote a shared read miss as ërdmss" and an exclusive read miss as ërdmsx". For states, use ëmod", ëexcl", ëshar", or ëinv" to represent the readèwrite, read only unshared, read only shared, and invalid states. Use ëènoneè" to represent an item that exists in the three-phase protocol, but not in the MESI protocol. P P2 Bus Memory Operation State Addr Val State Addr Val Action Proc Addr Val Addr Val P Rd A Shar A RdMs P A A 5 Excl A RdMsX P A Shar A 5 Read A 5 A 5 Excl A 5 Read A 5 P Wr 0toA Ex A 0 WrMs P A A 5 Mod A 0 ènoneè P2 Rd A Ex A 0 Shar A RdMs P2 A A 5 Mod A 0 Shar A RdMsS P2 A Shar A 0 Shar A WrBk P A 0 A 5 Shar A 0 Shar A WrBk P A 0 Shar A 0 Shar A 0 A 0 Shar A 0 Shar A 0 A 0 P2 Wr 5toA Shar A 0 Excl A 5 WrMs P2 A A 0 Shar A 0 Mod A 5 WrMs P2 A Inv A Excl A 5 A 0 Inv A Mod A 5 6

7 Question 3: Cluster vs SMP æ Evaluate the resource utilization while performing streaming IèO on the following three architectures: æ A single workstation æ A cluster of workstations æ A symmetric multiprocessor èsmpè The basis for the ærst two architectures is shown in Figure 3. The cluster is built of 8 copies of the single workstation and is shown in Figure 4. The workstation contains a 67 MHz processor èwith 52 KB of L2 cacheè and 28 Mbyte of memory. The memory bus is 28 bits wide and operates at 83.3 MHz. The workstation contains one 32-bit, 25 MHz IèO bus ècalled the S-Busè. Attached to this IèO bus are two fast-wide è6-bit, 0 MHzè SCSI controllers. In the cluster, a Myrinet network interface, which is a switch based network that can support 280 Mbitès in each direction, is also installed in each machine; the machines are all connected to a single eight-port switch. Processor Memory Memory Bus 28-bit, 83.3 MHz I/O Chip S-Bus 32-bit, 25 MHz 6-bit 0 MHz SCSI # Disk SCSI #2 Myrinet Network Interface Myrinet Network 280 Mbit/s Figure 3: The æ This problem is based on a simpliæed version of the study ëthe Architectural Costs of Streaming IèO: A Comparison of s, Clusters, and SMPs" by Remzi H. Arpaci-Dusseau, Andrea C. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein, and David A. Patterson from the Fourth International Symposium on High-Performance Computer Architecture. The paper is available at 7

8 Question 3 ècontinuedè Switch Myrinet Figure 4: The Cluster The SMP is shown in Figure 5. The system consists of four CPUèMemory and four S-Bus IèO boards connected via the GigaPlane memory bus. The GigaPlane is a 256-bit wide 83.3 MHz bus. Each CPUèMemory board contains two 67 MHz processors èeach with 52 KB of L2 cacheè and 256 Mbyte of memory. Each IèO board contains two S-Busses. Each S-Bus has one fast-wide è6-bit, 0 MHzè SCSI controller. All communication is performed via loads and stores to shared memory. All memory accesses are uniform access time. Processor CPU/Memory Board x4 S-Bus I/O Card x4 Memory SCSI # 6-bit, 0 MHz SCSI #2 Processor S-Bus # 32-bit, 25 MHz S-Bus #2 I/O Chip I/O Chip GigaPlane 256-bit, 83.3 MHz Figure 5: The SMP 8

9 Question 3 ècontinuedè The streaming IèO benchmark we will use is a sorting benchmark. The benchmark processes 00- byte records that include 0-byte keys. The basic algorithm is the same on all three platforms. In the ærst step, the records must be converted from the layout on disk to a format more suitable for eæcient sorting. As records are read from disk, the key èwhich is part of the recordè and a pointer to the full record are placed into buckets based on the top few bits of the key; this improves the cache behavior of the sort in two ways. First, the sort operates on only épartial key, pointeré pairs, thus copying only 8-bytes rather than 00-byte records as keys are compared and swapped. Second, the number of keys in each bucket matches the size of the second-level cache. The next step sorts the keys in each bucket. Assume that the data is initially randomly placed over all disks. The basic algorithm has been slightly tailored for best performance on each platform. Figures 6, 7, and 8 show a graphical representation of the read phase for each platform. The arrows show the order and direction of data that moves across busses, but does not show the relative sizes of each transfer. The following paragraphs refer to the numbers in those ægures. Disk Memory Processor Figure 6: Sort Read Phase In the workstation read phase, the input æle is read into the user's address space èè. These records are then copied to an input buæer è2, 3è. Each key is examined è4è, and a épartial key, pointeré pair is written into the correct bucket è5è. Disk / Net Memory Processor Figure 7: Cluster Sort Read Phase In the cluster read phase, the input æle is read into the user's address space èè. Records are then copied into one of 8 send buæers è2, 3è; as each buæer ælls, it is sent to the appropriate destination processor è4è. Upon receipt of records from other processors è5è, records are copied into a record buæer è6, 7è. Then, each key is examined, and a épartial key, pointeré pair is written into the bucket array è8, 9è, as in the single workstation sort. 9

10 Question 3 ècontinuedè Disk Memory Processor Figure 8: SMP Sort Read Phase In the SMP read phase, the input æle is read into the user's address space èè. Records are then copied into an input buæer è2, 3è. Each key is examined è4è, and a épartial key, pointeré pair is written into the correct bucket buæer è5è. When a bucket buæer ælles, the processor copies the épartial key, pointeré pair è6, 7è and records è8, 9è into a global array. The GigaPlane bus can sustain 94è of its theoretical maximum transfer rate. The SCSI bus can sustain 80è of its theoretical maximum transfer rate. The workstation and cluster memory bus can sustain 75è of its theoretical maximum transfer rate. The S-Bus can sustain 55è of its theoretical maximum transfer rate. The table below shows the number of millions of instructions required to processes each megabyte of data on the disk for the diæerent platforms. The diæerences are mainly from overhead of sending and receiving network messages, and from slightly diæerent ways of zeroing pages on the diæerent platforms. Cluster SMP Read Phase The table below shows the measured CPI for each platform while running the benchmark. Cluster SMP Read Phase

11 Question 3 ècontinuedè aè Determine how much of each resource èièo bus and memory busè is used during the read phase of the sort for each platform. First, write a general equation for how much ofeach resource is used in terms of the rate data is read from disk èd r è, the number of processors in the cluster or SMP èp è, and the sizes of the records èrecè, keys èkeyè, and épartial key, pointeré pairs èbucketè. D r is the total rate that data is read from disks èthe sum of all the individual disk ratesè. Give the combined bandwidth required for all the busses. Then, æll in the table on the next page with the summary. Provide a short justiæcation for these equations. The resource usage for the workstation sort has been completed as an example. Memory Bus: During the read phase, data is read from disk èd r è, then copied into memory è2d r è. The keys are read è key rec æ D rè, and épartial key, pointeré pairs written to the right bucket è bucket rec æ D r è. IèO Bus: Data is read from the disk èd r è. Cluster Memory Bus: During the read phase, data is read from disk èd r è, then copied into buæers è2d r è. Then, blocks are sent to other processors è P, æ D P rè, and received from other processors è P, æ D P rè then copied into buæers è2d r è. After this, keys are read è key rec æ D rè, and épartial key, pointeré pairs are written è bucket rec æ D r è. IèO Bus: Data is read from the disk èd r è and blocks are sent to and received from other processors è2 P, æ D P rè. SMP Memory Bus: During the read phase, data is read from disk èd r è, then copied into buæers è2d r è. Then, each key is examined è key rec æ D rè, and épartial key, pointeré pairs are written è bucket rec æ D r è. Once the buckets æll, épartial key, pointeré pairs are copied è2 bucket rec æ D r è and records are copied è2d r èinto a global array. IèO Bus: Data is read from the disk èd r è.

12 Question 3 ècontinuedè Resource Usage Memory Bus IèO Bus D r +è2d r è+è key rec èd r +è bucket rec èd r D r Cluster Memory Bus D r +2D r +2è P,èD P r +2D r +è key rec èd r +è bucket rec èd r Cluster IèO Bus D r +2è P,èD P r SMP Memory Bus SMP IèO Bus D r +4D r +è key rec èd r +3è bucket rec èd r D r bè Fill in the values of the general equations from part èaè, using the following values: 8 processors in the cluster and SMP èp è, 00-byte records èrecè, 0-byte keys èkeyè, and 8-byte épartial key, pointeré pairs èbucketè. Leave the term D r in your equations. The read phase of the workstation sort has been completed as an example. Memory Bus Usage IèO Bus Usage 3:8D r D r Cluster 6:93D r 2:75D r SMP 5:34D r D r 2

13 Question 3 ècontinuedè cè Each disk can read data at 5.5 Mbyteès. Assume disks are organized the most eæcient way possible èthe disks are equally spread over all the busses availableè. If we use 2 disks per processor, what is the utilization of each resource èscsi bus, IèO bus, memory bus, processorè during the read phase of the sort for only the cluster and SMP platforms? èdetermine utilization as a percent of maximum sustainable transfer rate for each bus.è SCSI Bus Cluster: 2 disks per processor, 2 SCSI busses per processor, therefore one disk per SCSI bus. 5.5 MBès.8 * 2 Bytes * 0 MHz =34:38è SMP: 2 disks per processor, SCSI bus per processor, therefore two disks per SCSI bus. 2 * 5.5 MBès.8 * 2 Bytes * 0 MHz =68:75è IèO Bus Cluster: The cluster bandwidth required on the IèO bus is 2.75 times the bandwidth read from disk * 2 * 5.5 MBès.55 * 4 Bytes * 25 MHz = 55è SMP: The SMP bandwidth required on the IèO bus is the same as the bandwidth read from disk. *2*5.5MBès.55 * 4 Bytes * 25 MHz = 20è 3

14 Question 3 ècontinuedè Memory Bus Cluster: The cluster bandwidth required on the memory bus is 6.93 times the bandwidth read from disk * 2 * 5.5 MBès.75 * 6 Bytes * 83.3 MHz =7:6è SMP: The SMP bandwidth required on the memory bus is 5.34 times the bandwidth read from disk. Since this is a shared memory bus, the total bandwidth required will be 8 times greater èsince we have 8 processorsè. 8 * 5.34 * 2 * 5.5 MBès.94 * 32 Bytes * 83.3 MHz =8:75è CPU Cluster: The cluster requires 5.5 million instructions per megabyte of data. The CPI during the read phase is * 2.2 * 2 * 5.5 MBès 67 MHz =79:70è SMP: The SMP requires 4.6 million instructions per megabyte of data. The CPI during the read phase is * 2.2 * 2 * 5.5 MBès 67 MHz =66:66è eè Explain brieæy which system scales the best èin terms of adding more disksè for this benchmark. The SMP can add another disk at full bandwidth for this benchmark while the cluster can not, because of the CPU utilization. 4

BWM CRM, PRM CRM, WB, PRM BRM, WB CWM, CWH, PWM. Exclusive (read/write) CWM PWM

BWM CRM, PRM CRM, WB, PRM BRM, WB CWM, CWH, PWM. Exclusive (read/write) CWM PWM University of California, Berkeley College of Engineering Computer Science Division EECS Spring 1998 D.A. Patterson Quiz 2 April 22, 1998 CS252 Graduate Computer Architecture You are allowed to use a calculator

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy

More information

Question 1: Calculate Your Cache A certain system with a 350 MHz clock uses a separate data and instruction cache, and a uniæed second-level cache. Th

Question 1: Calculate Your Cache A certain system with a 350 MHz clock uses a separate data and instruction cache, and a uniæed second-level cache. Th University of California, Berkeley College of Engineering Computer Science Division EECS Spring 1998 D.A. Patterson Quiz 1 March 4, 1998 CS252 Graduate Computer Architecture You are allowed to use a calculator

More information

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single

More information

Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol

Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol CSE 564 Computer Architecture Fall 2016 Department of Computer Science and Engineering Yonghong

More information

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017 CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Page 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology

Page 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology CS252 Graduate Computer Architecture Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency Review: Multiprocessor Basic issues and terminology Communication:

More information

Aleksandar Milenkovich 1

Aleksandar Milenkovich 1 Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection

More information

Thread- Level Parallelism. ECE 154B Dmitri Strukov

Thread- Level Parallelism. ECE 154B Dmitri Strukov Thread- Level Parallelism ECE 154B Dmitri Strukov Introduc?on Thread- Level parallelism Have mul?ple program counters and resources Uses MIMD model Targeted for?ghtly- coupled shared- memory mul?processors

More information

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017

CS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017 CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes

More information

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville

Aleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

CMSC 611: Advanced. Distributed & Shared Memory

CMSC 611: Advanced. Distributed & Shared Memory CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor

More information

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology

Review: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share

More information

Lecture 23: Thread Level Parallelism -- Introduction, SMP and Snooping Cache Coherence Protocol

Lecture 23: Thread Level Parallelism -- Introduction, SMP and Snooping Cache Coherence Protocol Lecture 23: Thread Level Parallelism -- Introduction, SMP and Snooping Cache Coherence Protocol CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu

More information

Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100

Problem Points 1 /20 2 /20 3 /20 4 /20 5 /20 Total /100 EE382: Processor Design Final Examination March 20, 1998 Please do not open the exam book or begin work on the exam until instructed to do so. You have a total of 3 hours to complete this exam. You will

More information

Fall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley

Fall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley Fall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley Avinash Kodi Department of Electrical Engineering & Computer

More information

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4 Outline CSCI Computer System Architecture Lec 8 Multiprocessor Introduction Xiuzhen Cheng Department of Computer Sciences The George Washington University MP Motivation SISD v. SIMD v. MIMD Centralized

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] Give a concise answer to each of the following

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz

More information

High-Performance Sorting on Networks of Workstations

High-Performance Sorting on Networks of Workstations High-Performance Sorting on Networks of Workstations Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau David E. Culler Computer Science Division Computer Science Division Computer Science Division University

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

DESIGN AND ANALYSIS OF UPDATE-BASED CACHE COHERENCE PROTOCOLS FOR SCALABLE SHARED-MEMORY MULTIPROCESSORS. David Brian Glasco

DESIGN AND ANALYSIS OF UPDATE-BASED CACHE COHERENCE PROTOCOLS FOR SCALABLE SHARED-MEMORY MULTIPROCESSORS. David Brian Glasco DESIGN AND ANALYSIS OF UPDATE-BASED CACHE COHERENCE PROTOCOLS FOR SCALABLE SHARED-MEMORY MULTIPROCESSORS David Brian Glasco Technical Report No. CSL-TR-95-670 June 1995 This research was Supported by Digital

More information

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014 Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:

More information

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University

More information

Sorting. Overview. External sorting. Warm up: in memory sorting. Purpose. Overview. Sort benchmarks

Sorting. Overview. External sorting. Warm up: in memory sorting. Purpose. Overview. Sort benchmarks 15-823 Advanced Topics in Database Systems Performance Sorting Shimin Chen School of Computer Science Carnegie Mellon University 22 March 2001 Sort benchmarks A base case: AlphaSort Improving Sort Performance

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Science 46 Computer Architecture Spring 24 Harvard University Instructor: Prof dbrooks@eecsharvardedu Lecture 22: More I/O Computer Science 46 Lecture Outline HW5 and Project Questions? Storage

More information

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2 I/O performance measures I/O performance measures diversity: which I/O devices can connect to the system? capacity: how many I/O devices

More information

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Lecture 7: Implementing Cache Coherence. Topics: implementation details Lecture 7: Implementing Cache Coherence Topics: implementation details 1 Implementing Coherence Protocols Correctness and performance are not the only metrics Deadlock: a cycle of resource dependencies,

More information

Snooping coherence protocols (cont.)

Snooping coherence protocols (cont.) Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18: Directory-Based Cache Protocols John Wawrzynek EECS, University of California at Berkeley http://inst.eecs.berkeley.edu/~cs152 Administrivia 2 Recap:

More information

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory

Multiprocessors. Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS. CPUs DO NOT share physical memory Loosely coupled [Multi-computer] each CPU has its own memory, I/O facilities and OS CPUs DO NOT share physical memory IITAC Cluster [in Lloyd building] 346 x IBM e326 compute node each with 2 x 2.4GHz

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Review. EECS 252 Graduate Computer Architecture. Lec 18 Storage. Introduction to Queueing Theory. Deriving Little s Law

Review. EECS 252 Graduate Computer Architecture. Lec 18 Storage. Introduction to Queueing Theory. Deriving Little s Law EECS 252 Graduate Computer Architecture Lec 18 Storage David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley Review Disks: Arial Density now 30%/yr vs. 100%/yr

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (III)

COSC 6385 Computer Architecture - Thread Level Parallelism (III) OS 6385 omputer Architecture - Thread Level Parallelism (III) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

The Impact of Write Back on Cache Performance

The Impact of Write Back on Cache Performance The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,

More information

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

Chapter Seven. Idea: create powerful computers by connecting many smaller ones Chapter Seven Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) vector processing may be coming back bad news:

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Protocols Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152

More information

CISC 662 Graduate Computer Architecture Lectures 15 and 16 - Multiprocessors and Thread-Level Parallelism

CISC 662 Graduate Computer Architecture Lectures 15 and 16 - Multiprocessors and Thread-Level Parallelism CISC 662 Graduate Computer Architecture Lectures 15 and 16 - Multiprocessors and Thread-Level Parallelism Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from

More information

COSC 6385 Computer Architecture - Memory Hierarchies (I)

COSC 6385 Computer Architecture - Memory Hierarchies (I) COSC 6385 Computer Architecture - Memory Hierarchies (I) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

Lecture 20: Multi-Cache Designs. Spring 2018 Jason Tang

Lecture 20: Multi-Cache Designs. Spring 2018 Jason Tang Lecture 20: Multi-Cache Designs Spring 2018 Jason Tang 1 Topics Split caches Multi-level caches Multiprocessor caches 2 3 Cs of Memory Behaviors Classify all cache misses as: Compulsory Miss (also cold-start

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

Lecture 17: Multiprocessors: Size, Consitency. Review: Networking Summary

Lecture 17: Multiprocessors: Size, Consitency. Review: Networking Summary Lecture 17: Multiprocessors: Size, Consitency Professor David A. Patterson Computer Science 252 Spring 1998 DAP Spr. 98 UCB 1 Review: Networking Summary Protocols allow hetereogeneous networking Protocols

More information

Thread-Level Parallelism

Thread-Level Parallelism Advanced Computer Architecture Thread-Level Parallelism Some slides are from the instructors resources which accompany the 6 th and previous editions. Some slides are from David Patterson, David Culler

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 4 Multiprocessors and Thread-Level Parallelism --II CA Lecture08 - multiprocessors and TLP (cwliu@twins.ee.nctu.edu.tw) 09-1 Review Caches contain all information on

More information

Abstract We present three enhancements to graduated declustering, a mechanism for improving the performance robustness of I/O-intensive parallel progr

Abstract We present three enhancements to graduated declustering, a mechanism for improving the performance robustness of I/O-intensive parallel progr Enhancing Graduated Declustering for Better Performance Availability on Clusters Noah Treuhaft Report No. UCB/CSD-00-1118 November 16, 2000 Computer Science Division (EECS) University of California Berkeley,

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based) Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a

More information

A three-state update protocol

A three-state update protocol A three-state update protocol Whenever a bus update is generated, suppose that main memory as well as the caches updates its contents. Then which state don t we need? What s the advantage, then, of having

More information

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017 CS 433 Homework 5 Assigned on 11/7/2017 Due in class on 11/30/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Midterm #2 Exam Solutions April 26, 2006 CS162 Operating Systems

Midterm #2 Exam Solutions April 26, 2006 CS162 Operating Systems University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2006 Anthony D. Joseph Midterm #2 Exam April 26, 2006 CS162 Operating Systems Your Name: SID AND 162 Login:

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Final Exam - Review Israel Koren ECE568 Final_Exam.1 1. A computer system contains an IOP which may

More information

Systems Architecture II

Systems Architecture II Systems Architecture II Topics Interfacing I/O Devices to Memory, Processor, and Operating System * Memory-mapped IO and Interrupts in SPIM** *This lecture was derived from material in the text (Chapter

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

COSC 6385 Computer Architecture. - Thread Level Parallelism (III)

COSC 6385 Computer Architecture. - Thread Level Parallelism (III) OS 6385 omputer Architecture - Thread Level Parallelism (III) Spring 2013 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Midterm #2 Exam April 26, 2006 CS162 Operating Systems

Midterm #2 Exam April 26, 2006 CS162 Operating Systems University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2006 Anthony D. Joseph Midterm #2 Exam April 26, 2006 CS162 Operating Systems Your Name: SID AND 162 Login:

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

ECE Sample Final Examination

ECE Sample Final Examination ECE 3056 Sample Final Examination 1 Overview The following applies to all problems unless otherwise explicitly stated. Consider a 2 GHz MIPS processor with a canonical 5-stage pipeline and 32 general-purpose

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE System Architecture Date: Tuesday 3rd June 2014 Time: 09:45-11:45 Please answer any THREE Questions from the FOUR questions provided Use a

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Performance of Main : Latency: affects cache miss

More information

Limitations of parallel processing

Limitations of parallel processing Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors

More information

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency

More information

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014

Mul$processor Architecture. CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 Mul$processor Architecture CS 5334/4390 Spring 2014 Shirley Moore, Instructor February 4, 2014 1 Agenda Announcements (5 min) Quick quiz (10 min) Analyze results of STREAM benchmark (15 min) Mul$processor

More information

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved. Intelligent Storage Results from real life testing Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA SAS Intelligent Storage components! OLAP Server! Scalable Performance Data Server!

More information

CS152 Computer Architecture and Engineering Lecture 17: Cache System

CS152 Computer Architecture and Engineering Lecture 17: Cache System CS152 Computer Architecture and Engineering Lecture 17 System March 17, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http//http.cs.berkeley.edu/~patterson

More information

Midterm Exam March 3, 1999 CS162 Operating Systems

Midterm Exam March 3, 1999 CS162 Operating Systems University of California, Berkeley College of Engineering Computer Science Division EECS Spring 1999 Anthony D. Joseph Midterm Exam March 3, 1999 CS162 Operating Systems Your Name: SID: TA: Discussion

More information

6 th Lecture :: The Cache - Part Three

6 th Lecture :: The Cache - Part Three Dr. Michael Manzke :: CS7031 :: 6 th Lecture :: The Cache - Part Three :: October 20, 2010 p. 1/17 [CS7031] Graphics and Console Hardware and Real-time Rendering 6 th Lecture :: The Cache - Part Three

More information

SGI Challenge Overview

SGI Challenge Overview CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 (Case Studies) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Computer Architecture Lecture 10: Thread Level Parallelism II (Chapter 5) Chih Wei Liu 劉志尉 National Chiao Tung University

Computer Architecture Lecture 10: Thread Level Parallelism II (Chapter 5) Chih Wei Liu 劉志尉 National Chiao Tung University Computer Architecture Lecture 10: Thread Level Parallelism II (Chapter 5) Chih Wei Liu 劉志尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Caches contain all information on state of cached

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Ministry of Education and Science of Ukraine Odessa I.I.Mechnikov National University

Ministry of Education and Science of Ukraine Odessa I.I.Mechnikov National University Ministry of Education and Science of Ukraine Odessa I.I.Mechnikov National University 1 Modern microprocessors have one or more levels inside the crystal cache. This arrangement allows to reach high system

More information

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System

More information

Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming

Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming Fall 2006 University of California, Berkeley College of Engineering Computer Science Division EECS John Kubiatowicz Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming Your

More information

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems Reading and References Input / Output Reading» Section 8.1-8.5, Computer Organization and Design, Patterson and Hennessy CSE 410, Spring 2004 Computer Systems http://www.cs.washington.edu/education/courses/410/04sp/

More information

CS654 Advanced Computer Architecture. Lec 3 - Introduction

CS654 Advanced Computer Architecture. Lec 3 - Introduction CS654 Advanced Computer Architecture Lec 3 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California,

More information

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important Storage Hierarchy III: I/O System Readings reg I$ D$ L2 L3 memory disk (swap) often boring, but still quite important ostensibly about general I/O, mainly about disks performance: latency & throughput

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information