CS 152 Computer Architecture and Engineering
|
|
- Millicent Randall
- 5 years ago
- Views:
Transcription
1 CS 152 Computer Architecture and Engineering Lecture Midterm II Review Session John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L16: Midterm I Review 1
2 Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review 2
3 Name: SSID: CS152 Midterm II May 1st, 2014 # Points All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet. Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Tot 100 Eric Love John Lazzaro 3
4 What does it cover? Lectures 9 onward Focus will be on problems that require you to do a task (write a small program, trace through execution,etc) that demonstrates that you understand a concept. [...] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through... CS 152 L16: Midterm I Review 4
5 CS 152 Computer Architecture and Engineering Lecture 9 -- Memory John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 5
6 Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 13-bit row address input 1 o f d e c o d e r CS 152 L9: Memory What if we want all of the bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits << Now the row access time looks fast! 8192 rows columns usable bits (tester found good bits in bigger array) bits delivered by sense amps Select requested bits, send off the chip 6
7 CS 152 Computer Architecture and Engineering Lecture Cache I John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 7
8 CS L8: Cache Latency: A closer look Read latency: Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency (cycles) E+07 Latency (sec) 0.6n 1.9n 1.9n 6.9n 100n 12.5m Hz 1.6G 533M 533M 145M 10M 80 Architect s latency toolkit: (1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later. UC Regents Fall 2008 UCB 8
9 CS 152 Computer Architecture and Engineering Lecture Cache II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 9
10 Issue #4: When to write to lower level... Policy Do read misses produce writes? Do repeated writes make it to lower level? Write-Through Data written to cache block also written to lower-level memory No Yes Write-Back Write data only to the cache Update lower level when a block falls out of the cache Yes No Related issue: Do writes to blocks not in the cache get put in the cache ( writeallocate ) or not? CS 152 L11: Cache II 10
11 CS 152 Computer Architecture and Engineering Lecture Virtual Memory John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 11
12 The TLB caches page table entries TLB caches page table entries. virtual address page off CS 152 L15: Virtual Memory e Page Table TLB frame page Virtual Address V page no. Page Table Base for ASIDReg index into page table physical address frame page off Page Table V Access Rights 10 offset PA table located in physical memory MIPS handles TLB misses in software (random replacement). Other machines use hardware. In this example, physical and virtual pages must be the same size! P page no. Physical frame address offset 10 Physical Address V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 Page fault UC Regents Fall 2006 UCB 12
13 CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 13
14 Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt,Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown... try: LW R3, head(r0) ; Load queue head into R3 spin: LW R4, tail(r0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDI R6, R3, 4 ; Shift head by one word!! Compare&Swap R3, R6, head(r0); Try to update head BNE R3, R6, try ; If not success, try again If R3!= R6, another thread got here first, so we must try again. If thread swaps out before Compare&Swap, no latency problem; this code only holds the lock for one instruction! CS 152 L24: Multiprocessors UC Regents Fall 2006 UCB 14
15 CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 15
16 Writes from 10,000 feet... for write-thru L1 For write-thru caches... Cache CPU0 Snooper CS 152 L14: Cache Design and Coherency Cache Memory bus CPU1 Shared Main Memory Hierarchy Snooper To a first-order, reads will just work if write-thru caches implement this policy. A two-state protocol (cache lines are valid or invalid ). 1. Writing CPU takes control of bus. 2. Address to be written is invalidated in all other caches. Reads will no longer hit in cache and get stale data. 3. Write is sent to main memory. Reads will cache miss, retrieve new value from main memory 16
17 CS 152 Computer Architecture and Engineering Lecture Advanced CPUs John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L15: Superscalars and Scoreboards 17
18 Split pipelines: a write-after-write hazard. Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R1, R2, R3 SUB R1, R2, R3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. CS L9: Advanced Processors I The pipeline splits after the RF stage, feeding functional units with different latencies. UC Regents Fall 2008 UCB 18
19 IF (Fetch) ID (Decode) EX (ALU) MEM WB Superscalar R machine IR IR IR IR Instruction Issue Logic rs1 rs2 RegFile rd1 A op A L U 32 Y R 64 Data Instr Mem Addr 32 ws1 wd1 rs3 rs4 ws2 wd2 WE1 rd2 rd3 rd4 WE2 B A B op A L U 32 Y R PC and Sequencer IR IR IR IR IF (Fetch) ID (Decode) EX (ALU) CS L9: Advanced Processors I MEM WB UC Regents Fall 2008 UCB 19
20 CS 152 Computer Architecture and Engineering Lecture Networks, Routers, Google John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 20
21 6 key parameters scale across dimension of by one server, by 80-server rack and by array To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent. 21
22 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling I Thanks to Krste Asanovic... John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 22
23 Given an endless supply of registers... Rename architected registers (Ri, Fi) to new physical registers (PRi, PFi) on each write. Loop: LD F0,0(R1) ADDD SD What was gained? An instruction may execute once all of its source registers have been written. ADDI R1,R0,64 SUBI BNEZ NOP F4,F0,F2 0(R1),F4 F4,0(R1) R1,R1,8 R1,Loop CS 152 L18: Dynamic Scheduling I R1 PR01 F0 PF00!!! ADDI PR01,PR00,64!!! LD PF00 0(PR01)!!! ADDD PF04, PF00, PF02!!! SD PF04, 0(PR01)!!! SUBI PR11, PR01, 8!!! BEQZ PR11 ENDLOOP ITER2:!LD PF10 0(PR11)!!! ADDD PF14, PF10, PF02!!! SD PF14, 0(PR11)!!! SUBI PR21, PR11, 8!!! BEQZ PR21 ENDLOOP ITER3:!LD PF20 O(PR21)!!! [...] 23
24 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 24
25 Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in one clock cycle! Rename 2 Integer register rename Floatingpoint register rename For mis-speculation recovery Timestamped. Register numbers Instructions (4) Map Saved map state Map contentaddressable memories internal registers 80 in-flight instructions Reque Internal register numbers Input: 4 instructions specifying architected registers. Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued. 25
26 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling III John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 26
27 Micro-op translation example... ADC m32, r32: // for a simple m32 address mode Becomes: LD T1 0(EBX); // EBX register point to m32 ADD T1, T1, CF; // CF is carry flag from EFLAGS ADD T1, T1, r32; // Add the specified register ST 0(EBX) T1; // Store result back to m32 Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles. CS 152 L20: Dynamic Scheduling III UC Regents Fall 2006 UCB 27
28 CS 152 Computer Architecture and Engineering Lecture Dataflow John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 28
29 Dataflow stages of Idea: Write dataflow programs that reference physical registers, to execute on this machine. Ebox Cluster 0 Input: Instructions that reference physical registers. Register scoreboard $. + 5 Execute -+ Reg File (80) Execute Ebox Cluster Media I Scoreboard: Tracks writes to physical registers. 29
30 CS 152 Computer Architecture and Engineering Lecture GPU + SIMD + Vectors I John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 30
31 Pure data move opcode. DEST X0 X0 X0 X0 X0 X0 m32 X0 X0 X0 Figure VBROADCASTSS Operation (VEX.256 encoded version) Or, part of a math opcode. X3 Y3 ADD X2 Y2 ADD X1 Y1 ADD X0 Y0 ADD Y2 + Y3 Y0 + Y1 X2 + X3 X0 + X1 Figure Horizontal Data Movement in PHADDD 31
32 CS 152 Computer Architecture and Engineering Lecture GPU + SIMD + Vectors II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 32
33 Assume MacBook Air x 768 screen... We are all zoomed in on Google Maps Top pyramid image is 4K x 4K... Idea: Keep only a 1386 x 768 window of top images in RAM... Needed Clipped Clipped Eyepoint Near Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34.7 MB! Type and Size Full Mipmap 682KB 2.7MB 42.7MB 2.7GB 10923TB Clipmap 682KB 1.1MB 2.2MB 3.7MB 9.1MB Clipmap 682KB 2.7MB 6.7MB 12.7MB 34.7MB Clipmap 682KB 2.7MB 18.7MB 42.7MB 131.7MB 33
34 Zoom all the way in... units of pixels Map Levels (not to scale) Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. Portion of Source Image Covered by Map Level (one axis shown) clip map pyramid Clip-Map Pyramid Clip-Map Stack Map Levels units of sq. miles Clip-Map Center clip map stack Graphics hardware displays bottom stack image, which fills MacBook Air display. Viewer Position in Source Texture units of miles Hardware interpolation of stack levels. 34
35 CS 152 Computer Architecture and Engineering Lecture Voxel Processing John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 35
36 After processing... A 3-D matrix of cubes, in object space (X,Y,Z). 8-bit density value stored for each cube (0 = air ). 256^3 = 16 MB = 10 inch cube (for 1mm voxels) mm voxels? 8 GB Interesting to computer architects because n^3 grows so quickly! ;J l;lieiit : I I~ I I II i -[.!44~']4-f~.A. ;i I~ ~1 t ~ I <.,l,,,f.qdj4ff~-,ll& S i I; ~ i! b J ~ I"1 i i "dql.h'~>l~& I ~ I, I I I I ~ 1,~t,k[MLH'~ ::1 I ', I',I I/III m v ~p2" Smallest box Y shown is 16-cube OBJm-CT SPAC~ PARTITIONED AMONG 64 PROCESSORS 36
37 CS 152 Computer Architecture and Engineering Lecture Digital Imaging John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 37
38 Camera interface to the outside world Simple Power Hookup Serial port to control the camera. Figure 4: 44-Ball icsp Assignment A B C D E F G 1 DGND DOUT4 DOUT5 DOUT7 DOUT LSB0 SADDR DGND 2 DOUT3 VDD DOUT6 VDDQ DOUT LSB1 VDD LINE VALID 3 DOUT2 DOUT1 DGNDQ DGNDQ FRAME VALID RESET# 4 VDDQ DOUT0 STAND BY VDDQ 5 CLK_IN PIXCLK DGNDQ DGNDQ OE# STROBE 6 SCLK VDD VDDQ VDDQ NC VAA NC 7 DGND DGNDQ SDATA 1 DGND VAAPIX NC AGND Two-Wire Serial Interface Master Clock Power-on Reset 1.7V 3.6V I/O Digital VDDQ SADDR SCLK SDATA CLKIN RESET# OE# STANDBY DGNDQ Digital GND 2.8V Core Digital VDD DGND 2.8V Analog VAA DOUT[7:0] FRAME_VALID LINE_VALID PIXCLK STROBE AGND VAAPIX Analog GND To CMOS Camera Port To Xenon or LED Flash Driver 8-bit Dout Port 54 MHz Clk 1280 x 15 fps 640 x 30 fps YCrCb 4:2:2 Top View (Ball Down) CS 250 L12: CMOS Imagers UC Regents Fall 2012 UCB 38
39 AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1.3 G-pixel 3 frames/sec 39
40 On Thursday Mid-term II... Ground rules... 40
41 Mid-term: How to do well... Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be you can only get it if do the reading problems... but the reading helps you understand how to think through the problem. CS 152 L16: Midterm I Review 41
42 Mid-term: There may be math... No memorization: If we ask about Amdahl s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. You may need to do: simple algebra and calculus, add a few numbers by hand, etc. Cannot use electronic devices... more administrative info after we do some content. CS 152 L16: Midterm I Review 42
43 When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday May 1st, 306 Soda. Every-other-seat seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils 10:55 AM, so we can collect papers before next class comes in. CS 152 L16: Midterm I Review 43
44 When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you ll still need to hand in your 10:55. Questions are reserved for serious concerns about a bug in the question. CS 152 L16: Midterm I Review 44
45 Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review 45
46 On Thursday Mid-term II... See you there! 46
CS 152 Computer Architecture and Engineering
CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 27 Multiprocessors 2005-4-28 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time:
More informationCS152 Computer Architecture and Engineering. Lecture 15 Virtual Memory Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.
CS152 Computer Architecture and Engineering Lecture 15 Virtual Memory 2004-10-21 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/
More informationCS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design
CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014-3-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today:
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/
More informationCS Digital Systems Project Laboratory. Lecture 10: Advanced Processors II
CS 194-6 Digital Systems Project Laboratory Lecture 10: Advanced Processors II 2008-11-24 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TA: Greg Gibeling www-inst.eecs.berkeley.edu/~cs194-6/
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L12: Virtual
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L10: Cache I UC
More informationCS252 Graduate Computer Architecture Lecture 6. Recall: Software Pipelining Example
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming John Kubiatowicz Electrical Engineering and Computer Sciences
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2006-9-19 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ Last Time: ipod
More informationCMSC411 Fall 2013 Midterm 1
CMSC411 Fall 2013 Midterm 1 Name: Instructions You have 75 minutes to take this exam. There are 100 points in this exam, so spend about 45 seconds per point. You do not need to provide a number if you
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationCS 152 Computer Architecture and Engineering
CS 52 Computer Architecture and Engineering Lecture 6 -- Midterm I Review Session 204-3-3 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs52/ Play: CS 52 L6: Midterm
More informationPage # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationReview: Compiler techniques for parallelism Loop unrolling Ÿ Multiple iterations of loop in software:
CS152 Computer Architecture and Engineering Lecture 17 Dynamic Scheduling: Tomasulo March 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
More informationFour Steps of Speculative Tomasulo cycle 0
HW support for More ILP Hardware Speculative Execution Speculation: allow an instruction to issue that is dependent on branch, without any consequences (including exceptions) if branch is predicted incorrectly
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationCache Organizations for Multi-cores
Lecture 26: Recap Announcements: Assgn 9 (and earlier assignments) will be ready for pick-up from the CS front office later this week Office hours: all day next Tuesday Final exam: Wednesday 13 th, 7:50-10am,
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationThe Evolution of Microprocessors. Per Stenström
The Evolution of Microprocessors Per Stenström Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory Evolution of Microprocessors Multicycle
More informationPage 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer
CISC 662 Graduate Computer Architecture Lecture 8 - ILP 1 Michela Taufer Pipeline CPI http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson
More informationLecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S
Lecture 6 MIPS R4000 and Instruction Level Parallelism Computer Architectures 521480S Case Study: MIPS R4000 (200 MHz, 64-bit instructions, MIPS-3 instruction set) 8 Stage Pipeline: first half of fetching
More informationRecall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationInstruction-Level Parallelism and Its Exploitation
Chapter 2 Instruction-Level Parallelism and Its Exploitation 1 Overview Instruction level parallelism Dynamic Scheduling Techniques es Scoreboarding Tomasulo s s Algorithm Reducing Branch Cost with Dynamic
More informationEN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts
EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 15 Cache II 2005-3-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Locality
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer rchitecture and Engineering Lecture 10 Pipelining III 2005-2-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Ts: Ted Hong and David arquardt www-inst.eecs.berkeley.edu/~cs152/ Last time:
More informationc. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?
Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined
More informationCompiler Optimizations. Lecture 7 Overview of Superscalar Techniques. Memory Allocation by Compilers. Compiler Structure. Register allocation
Lecture 7 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2013 Reading: Textbook, Ch. 3 Complexity-Effective Superscalar Processors, PhD Thesis by Subbarao Palacharla, Ch.1
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationExploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop
More information5008: Computer Architecture
5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage
More informationECE 2300 Digital Logic & Computer Organization. Caches
ECE 23 Digital Logic & Computer Organization Spring 217 s Lecture 2: 1 Announcements HW7 will be posted tonight Lab sessions resume next week Lecture 2: 2 Course Content Binary numbers and logic gates
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationEECS Digital Design
EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance
More informationENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013
ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of
More informationPage 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring
More informationHardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.
Instruction-Level Parallelism and its Exploitation: PART 2 Hardware-based speculation (2.6) Multiple-issue plus static scheduling = VLIW (2.7) Multiple-issue, dynamic scheduling, and speculation (2.8)
More informationCS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 16, 2014 Time: 1 hour + 15 minutes Name: Alias: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationReview: Evaluating Branch Alternatives. Lecture 3: Introduction to Advanced Pipelining. Review: Evaluating Branch Prediction
Review: Evaluating Branch Alternatives Lecture 3: Introduction to Advanced Pipelining Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier Pipeline speedup
More informationNOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline
CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism
More informationCSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1
CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationCISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3
CISC 662 Graduate Computer Architecture Lecture 10 - ILP 3 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationLecture 4: Introduction to Advanced Pipelining
Lecture 4: Introduction to Advanced Pipelining Prepared by: Professor David A. Patterson Computer Science 252, Fall 1996 Edited and presented by : Prof. Kurt Keutzer Computer Science 252, Spring 2000 KK
More informationLec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements
Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct
More informationProcessor: Superscalars Dynamic Scheduling
Processor: Superscalars Dynamic Scheduling Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 (Princeton),
More informationKeywords and Review Questions
Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head
More informationCPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation
Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Instruction
More informationCS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars
CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
More informationCS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25
CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationPage 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution
CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste!
More informationMemory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky
Memory Hierarchy, Fully Associative Caches Instructor: Nick Riasanovsky Review Hazards reduce effectiveness of pipelining Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationCS 152 Computer Architecture and Engineering. Lecture 13 - Out-of-Order Issue and Register Renaming
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://wwweecsberkeleyedu/~krste
More informationCS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes
CS433 Midterm Prof Josep Torrellas October 19, 2017 Time: 1 hour + 15 minutes Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 4 Questions. Please budget your time.
More informationELE 375 Final Exam Fall, 2000 Prof. Martonosi
ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More information/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:
16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic
More informationC 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA
CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo Last time Virtual address caches Virtually-indexed, physically-tagged cache design
More informationComputer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types
More informationThis Set. Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods
10-1 Dynamic Scheduling 10-1 This Set Scheduling and Dynamic Execution Definitions From various parts of Chapter 4. Description of Three Dynamic Scheduling Methods Not yet complete. (Material below may
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 13 Memory and Interfaces 2005-3-1 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last
More informationDonn Morrison Department of Computer Science. TDT4255 ILP and speculation
TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple
More informationSuper Scalar. Kalyan Basu March 21,
Super Scalar Kalyan Basu basu@cse.uta.edu March 21, 2007 1 Super scalar Pipelines A pipeline that can complete more than 1 instruction per cycle is called a super scalar pipeline. We know how to build
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationOPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.
CS/ECE472 Midterm #2 Fall 2008 NAME: Student ID#: OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS. Your signature is your promise that you have not cheated and will
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationGood luck and have fun!
Midterm Exam October 13, 2014 Name: Problem 1 2 3 4 total Points Exam rules: Time: 90 minutes. Individual test: No team work! Open book, open notes. No electronic devices, except an unprogrammed calculator.
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationReview: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.
Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum
More informationComputer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining
Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one
More informationILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)
Instruction-Level Parallelism and its Exploitation: PART 1 ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5) Project and Case
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationCS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck
Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationMain Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be
More information