CS 152 Computer Architecture and Engineering

Size: px
Start display at page:

Download "CS 152 Computer Architecture and Engineering"

Transcription

1 CS 152 Computer Architecture and Engineering Lecture Midterm II Review Session John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L16: Midterm I Review 1

2 Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review 2

3 Name: SSID: CS152 Midterm II May 1st, 2014 # Points All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet. Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Tot 100 Eric Love John Lazzaro 3

4 What does it cover? Lectures 9 onward Focus will be on problems that require you to do a task (write a small program, trace through execution,etc) that demonstrates that you understand a concept. [...] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through... CS 152 L16: Midterm I Review 4

5 CS 152 Computer Architecture and Engineering Lecture 9 -- Memory John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 5

6 Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 13-bit row address input 1 o f d e c o d e r CS 152 L9: Memory What if we want all of the bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits << Now the row access time looks fast! 8192 rows columns usable bits (tester found good bits in bigger array) bits delivered by sense amps Select requested bits, send off the chip 6

7 CS 152 Computer Architecture and Engineering Lecture Cache I John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 7

8 CS L8: Cache Latency: A closer look Read latency: Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency (cycles) E+07 Latency (sec) 0.6n 1.9n 1.9n 6.9n 100n 12.5m Hz 1.6G 533M 533M 145M 10M 80 Architect s latency toolkit: (1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later. UC Regents Fall 2008 UCB 8

9 CS 152 Computer Architecture and Engineering Lecture Cache II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 9

10 Issue #4: When to write to lower level... Policy Do read misses produce writes? Do repeated writes make it to lower level? Write-Through Data written to cache block also written to lower-level memory No Yes Write-Back Write data only to the cache Update lower level when a block falls out of the cache Yes No Related issue: Do writes to blocks not in the cache get put in the cache ( writeallocate ) or not? CS 152 L11: Cache II 10

11 CS 152 Computer Architecture and Engineering Lecture Virtual Memory John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 11

12 The TLB caches page table entries TLB caches page table entries. virtual address page off CS 152 L15: Virtual Memory e Page Table TLB frame page Virtual Address V page no. Page Table Base for ASIDReg index into page table physical address frame page off Page Table V Access Rights 10 offset PA table located in physical memory MIPS handles TLB misses in software (random replacement). Other machines use hardware. In this example, physical and virtual pages must be the same size! P page no. Physical frame address offset 10 Physical Address V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 Page fault UC Regents Fall 2006 UCB 12

13 CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 13

14 Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt,Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown... try: LW R3, head(r0) ; Load queue head into R3 spin: LW R4, tail(r0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDI R6, R3, 4 ; Shift head by one word!! Compare&Swap R3, R6, head(r0); Try to update head BNE R3, R6, try ; If not success, try again If R3!= R6, another thread got here first, so we must try again. If thread swaps out before Compare&Swap, no latency problem; this code only holds the lock for one instruction! CS 152 L24: Multiprocessors UC Regents Fall 2006 UCB 14

15 CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 15

16 Writes from 10,000 feet... for write-thru L1 For write-thru caches... Cache CPU0 Snooper CS 152 L14: Cache Design and Coherency Cache Memory bus CPU1 Shared Main Memory Hierarchy Snooper To a first-order, reads will just work if write-thru caches implement this policy. A two-state protocol (cache lines are valid or invalid ). 1. Writing CPU takes control of bus. 2. Address to be written is invalidated in all other caches. Reads will no longer hit in cache and get stale data. 3. Write is sent to main memory. Reads will cache miss, retrieve new value from main memory 16

17 CS 152 Computer Architecture and Engineering Lecture Advanced CPUs John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L15: Superscalars and Scoreboards 17

18 Split pipelines: a write-after-write hazard. Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R1, R2, R3 SUB R1, R2, R3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. CS L9: Advanced Processors I The pipeline splits after the RF stage, feeding functional units with different latencies. UC Regents Fall 2008 UCB 18

19 IF (Fetch) ID (Decode) EX (ALU) MEM WB Superscalar R machine IR IR IR IR Instruction Issue Logic rs1 rs2 RegFile rd1 A op A L U 32 Y R 64 Data Instr Mem Addr 32 ws1 wd1 rs3 rs4 ws2 wd2 WE1 rd2 rd3 rd4 WE2 B A B op A L U 32 Y R PC and Sequencer IR IR IR IR IF (Fetch) ID (Decode) EX (ALU) CS L9: Advanced Processors I MEM WB UC Regents Fall 2008 UCB 19

20 CS 152 Computer Architecture and Engineering Lecture Networks, Routers, Google John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 20

21 6 key parameters scale across dimension of by one server, by 80-server rack and by array To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent. 21

22 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling I Thanks to Krste Asanovic... John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 22

23 Given an endless supply of registers... Rename architected registers (Ri, Fi) to new physical registers (PRi, PFi) on each write. Loop: LD F0,0(R1) ADDD SD What was gained? An instruction may execute once all of its source registers have been written. ADDI R1,R0,64 SUBI BNEZ NOP F4,F0,F2 0(R1),F4 F4,0(R1) R1,R1,8 R1,Loop CS 152 L18: Dynamic Scheduling I R1 PR01 F0 PF00!!! ADDI PR01,PR00,64!!! LD PF00 0(PR01)!!! ADDD PF04, PF00, PF02!!! SD PF04, 0(PR01)!!! SUBI PR11, PR01, 8!!! BEQZ PR11 ENDLOOP ITER2:!LD PF10 0(PR11)!!! ADDD PF14, PF10, PF02!!! SD PF14, 0(PR11)!!! SUBI PR21, PR11, 8!!! BEQZ PR21 ENDLOOP ITER3:!LD PF20 O(PR21)!!! [...] 23

24 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 24

25 Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in one clock cycle! Rename 2 Integer register rename Floatingpoint register rename For mis-speculation recovery Timestamped. Register numbers Instructions (4) Map Saved map state Map contentaddressable memories internal registers 80 in-flight instructions Reque Internal register numbers Input: 4 instructions specifying architected registers. Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued. 25

26 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling III John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 26

27 Micro-op translation example... ADC m32, r32: // for a simple m32 address mode Becomes: LD T1 0(EBX); // EBX register point to m32 ADD T1, T1, CF; // CF is carry flag from EFLAGS ADD T1, T1, r32; // Add the specified register ST 0(EBX) T1; // Store result back to m32 Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles. CS 152 L20: Dynamic Scheduling III UC Regents Fall 2006 UCB 27

28 CS 152 Computer Architecture and Engineering Lecture Dataflow John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 28

29 Dataflow stages of Idea: Write dataflow programs that reference physical registers, to execute on this machine. Ebox Cluster 0 Input: Instructions that reference physical registers. Register scoreboard $. + 5 Execute -+ Reg File (80) Execute Ebox Cluster Media I Scoreboard: Tracks writes to physical registers. 29

30 CS 152 Computer Architecture and Engineering Lecture GPU + SIMD + Vectors I John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 30

31 Pure data move opcode. DEST X0 X0 X0 X0 X0 X0 m32 X0 X0 X0 Figure VBROADCASTSS Operation (VEX.256 encoded version) Or, part of a math opcode. X3 Y3 ADD X2 Y2 ADD X1 Y1 ADD X0 Y0 ADD Y2 + Y3 Y0 + Y1 X2 + X3 X0 + X1 Figure Horizontal Data Movement in PHADDD 31

32 CS 152 Computer Architecture and Engineering Lecture GPU + SIMD + Vectors II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 32

33 Assume MacBook Air x 768 screen... We are all zoomed in on Google Maps Top pyramid image is 4K x 4K... Idea: Keep only a 1386 x 768 window of top images in RAM... Needed Clipped Clipped Eyepoint Near Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34.7 MB! Type and Size Full Mipmap 682KB 2.7MB 42.7MB 2.7GB 10923TB Clipmap 682KB 1.1MB 2.2MB 3.7MB 9.1MB Clipmap 682KB 2.7MB 6.7MB 12.7MB 34.7MB Clipmap 682KB 2.7MB 18.7MB 42.7MB 131.7MB 33

34 Zoom all the way in... units of pixels Map Levels (not to scale) Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. Portion of Source Image Covered by Map Level (one axis shown) clip map pyramid Clip-Map Pyramid Clip-Map Stack Map Levels units of sq. miles Clip-Map Center clip map stack Graphics hardware displays bottom stack image, which fills MacBook Air display. Viewer Position in Source Texture units of miles Hardware interpolation of stack levels. 34

35 CS 152 Computer Architecture and Engineering Lecture Voxel Processing John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 35

36 After processing... A 3-D matrix of cubes, in object space (X,Y,Z). 8-bit density value stored for each cube (0 = air ). 256^3 = 16 MB = 10 inch cube (for 1mm voxels) mm voxels? 8 GB Interesting to computer architects because n^3 grows so quickly! ;J l;lieiit : I I~ I I II i -[.!44~']4-f~.A. ;i I~ ~1 t ~ I <.,l,,,f.qdj4ff~-,ll& S i I; ~ i! b J ~ I"1 i i "dql.h'~>l~& I ~ I, I I I I ~ 1,~t,k[MLH'~ ::1 I ', I',I I/III m v ~p2" Smallest box Y shown is 16-cube OBJm-CT SPAC~ PARTITIONED AMONG 64 PROCESSORS 36

37 CS 152 Computer Architecture and Engineering Lecture Digital Imaging John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 37

38 Camera interface to the outside world Simple Power Hookup Serial port to control the camera. Figure 4: 44-Ball icsp Assignment A B C D E F G 1 DGND DOUT4 DOUT5 DOUT7 DOUT LSB0 SADDR DGND 2 DOUT3 VDD DOUT6 VDDQ DOUT LSB1 VDD LINE VALID 3 DOUT2 DOUT1 DGNDQ DGNDQ FRAME VALID RESET# 4 VDDQ DOUT0 STAND BY VDDQ 5 CLK_IN PIXCLK DGNDQ DGNDQ OE# STROBE 6 SCLK VDD VDDQ VDDQ NC VAA NC 7 DGND DGNDQ SDATA 1 DGND VAAPIX NC AGND Two-Wire Serial Interface Master Clock Power-on Reset 1.7V 3.6V I/O Digital VDDQ SADDR SCLK SDATA CLKIN RESET# OE# STANDBY DGNDQ Digital GND 2.8V Core Digital VDD DGND 2.8V Analog VAA DOUT[7:0] FRAME_VALID LINE_VALID PIXCLK STROBE AGND VAAPIX Analog GND To CMOS Camera Port To Xenon or LED Flash Driver 8-bit Dout Port 54 MHz Clk 1280 x 15 fps 640 x 30 fps YCrCb 4:2:2 Top View (Ball Down) CS 250 L12: CMOS Imagers UC Regents Fall 2012 UCB 38

39 AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1.3 G-pixel 3 frames/sec 39

40 On Thursday Mid-term II... Ground rules... 40

41 Mid-term: How to do well... Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be you can only get it if do the reading problems... but the reading helps you understand how to think through the problem. CS 152 L16: Midterm I Review 41

42 Mid-term: There may be math... No memorization: If we ask about Amdahl s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. You may need to do: simple algebra and calculus, add a few numbers by hand, etc. Cannot use electronic devices... more administrative info after we do some content. CS 152 L16: Midterm I Review 42

43 When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday May 1st, 306 Soda. Every-other-seat seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils 10:55 AM, so we can collect papers before next class comes in. CS 152 L16: Midterm I Review 43

44 When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you ll still need to hand in your 10:55. Questions are reserved for serious concerns about a bug in the question. CS 152 L16: Midterm I Review 44

45 Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review 45

46 On Thursday Mid-term II... See you there! 46

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 27 Multiprocessors 2005-4-28 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time:

More information

CS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.

CS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley. CS152 Computer Architecture and Engineering Lecture 15 Virtual Memory 2004-10-21 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/

More information

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design

CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014-3-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today:

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/

More information

CS Digital Systems Project Laboratory. Lecture 10: Advanced Processors II

CS Digital Systems Project Laboratory. Lecture 10: Advanced Processors II CS 194-6 Digital Systems Project Laboratory Lecture 10: Advanced Processors II 2008-11-24 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TA: Greg Gibeling www-inst.eecs.berkeley.edu/~cs194-6/

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L12: Virtual

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L10: Cache I UC

More information

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example

CS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2006-9-19 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ Last Time: ipod

More information

CMSC411 Fall 2013 Midterm 1

CMSC411 Fall 2013 Midterm 1 CMSC411 Fall 2013 Midterm 1 Name: Instructions You have 75 minutes to take this exam. There are 100 points in this exam, so spend about 45 seconds per point. You do not need to provide a number if you

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 6 -- Midterm I Review Session 204-3-3 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs52/ Play: CS 52 L6: Midterm

More information

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

Review: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:

Review: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software: CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/

More information

Four Steps of Speculative Tomasulo cycle 0

Four Steps of Speculative Tomasulo cycle 0 HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

Cache Organizations for Multi-cores

Cache Organizations for Multi-cores Lecture 26: Recap Announcements: Assgn 9 (and earlier assignments) will be ready for pick-up from the CS front office later this week Office hours: all day next Tuesday Final exam: Wednesday 13 th, 7:50-10am,

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

The Evolution of Microprocessors. Per Stenström

The Evolution of Microprocessors. Per Stenström The Evolution of Microprocessors Per Stenström Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory Evolution of Microprocessors Multicycle

More information

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson

More information

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Instruction-Level Parallelism and Its Exploitation

Instruction-Level Parallelism and Its Exploitation Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic

More information

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 15 Cache II 2005-3-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Locality

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer rchitecture and Engineering Lecture 10 Pipelining III 2005-2-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Ts: Ted Hong and David arquardt www-inst.eecs.berkeley.edu/~cs152/ Last time:

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

Compiler Optimizations. Lecture 7 Overview of Superscalar Techniques. Memory Allocation by Compilers. Compiler Structure. Register allocation

Compiler Optimizations. Lecture 7 Overview of Superscalar Techniques. Memory Allocation by Compilers. Compiler Structure. Register allocation Lecture 7 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2013 Reading: Textbook, Ch. 3 Complexity-Effective Superscalar Processors, PhD Thesis by Subbarao Palacharla, Ch.1

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

ECE 2300 Digital Logic & Computer Organization. Caches

ECE 2300 Digital Logic & Computer Organization. Caches ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

EECS Digital Design

EECS Digital Design EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.

Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2. Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)

More information

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction

Review: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB) CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so

More information

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3 CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Lecture 4: Introduction to Advanced Pipelining

Lecture 4: Introduction to Advanced Pipelining Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK

More information

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct

More information

Processor: Superscalars Dynamic Scheduling

Processor: Superscalars Dynamic Scheduling Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!

More information

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste

More information

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #: 16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic

More information

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods

This Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods 10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 13 Memory and Interfaces 2005-3-1 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

Super Scalar. Kalyan Basu March 21,

Super Scalar. Kalyan Basu March 21, Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Lecture 7 Pipelining. Peng Liu.

Lecture 7 Pipelining. Peng Liu. Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS. CS/ECE472 Midterm #2 Fall 2008 NAME: Student ID#: OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS. Your signature is your promise that you have not cheated and will

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

Good luck and have fun!

Good luck and have fun! Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds. Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information