CS 152 Computer Architecture and Engineering

Size: px

Start display at page:

Download "CS 152 Computer Architecture and Engineering"

Millicent Randall
5 years ago
Views:

1 CS 152 Computer Architecture and Engineering Lecture Midterm II Review Session John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L16: Midterm I Review 1

2 Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review 2

3 Name: SSID: CS152 Midterm II May 1st, 2014 # Points All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet. Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Tot 100 Eric Love John Lazzaro 3

4 What does it cover? Lectures 9 onward Focus will be on problems that require you to do a task (write a small program, trace through execution,etc) that demonstrates that you understand a concept. [...] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through... CS 152 L16: Midterm I Review 4

5 CS 152 Computer Architecture and Engineering Lecture 9 -- Memory John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 5

6 Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 13-bit row address input 1 o f d e c o d e r CS 152 L9: Memory What if we want all of the bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits << Now the row access time looks fast! 8192 rows columns usable bits (tester found good bits in bigger array) bits delivered by sense amps Select requested bits, send off the chip 6

7 CS 152 Computer Architecture and Engineering Lecture Cache I John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 7

8 CS L8: Cache Latency: A closer look Read latency: Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency (cycles) E+07 Latency (sec) 0.6n 1.9n 1.9n 6.9n 100n 12.5m Hz 1.6G 533M 533M 145M 10M 80 Architect s latency toolkit: (1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later. UC Regents Fall 2008 UCB 8

9 CS 152 Computer Architecture and Engineering Lecture Cache II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 9

10 Issue #4: When to write to lower level... Policy Do read misses produce writes? Do repeated writes make it to lower level? Write-Through Data written to cache block also written to lower-level memory No Yes Write-Back Write data only to the cache Update lower level when a block falls out of the cache Yes No Related issue: Do writes to blocks not in the cache get put in the cache ( writeallocate ) or not? CS 152 L11: Cache II 10

11 CS 152 Computer Architecture and Engineering Lecture Virtual Memory John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 11

12 The TLB caches page table entries TLB caches page table entries. virtual address page off CS 152 L15: Virtual Memory e Page Table TLB frame page Virtual Address V page no. Page Table Base for ASIDReg index into page table physical address frame page off Page Table V Access Rights 10 offset PA table located in physical memory MIPS handles TLB misses in software (random replacement). Other machines use hardware. In this example, physical and virtual pages must be the same size! P page no. Physical frame address offset 10 Physical Address V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 Page fault UC Regents Fall 2006 UCB 12

13 CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 13

14 Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt,Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown... try: LW R3, head(r0) ; Load queue head into R3 spin: LW R4, tail(r0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDI R6, R3, 4 ; Shift head by one word!! Compare&Swap R3, R6, head(r0); Try to update head BNE R3, R6, try ; If not success, try again If R3!= R6, another thread got here first, so we must try again. If thread swaps out before Compare&Swap, no latency problem; this code only holds the lock for one instruction! CS 152 L24: Multiprocessors UC Regents Fall 2006 UCB 14

15 CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 15

16 Writes from 10,000 feet... for write-thru L1 For write-thru caches... Cache CPU0 Snooper CS 152 L14: Cache Design and Coherency Cache Memory bus CPU1 Shared Main Memory Hierarchy Snooper To a first-order, reads will just work if write-thru caches implement this policy. A two-state protocol (cache lines are valid or invalid ). 1. Writing CPU takes control of bus. 2. Address to be written is invalidated in all other caches. Reads will no longer hit in cache and get stale data. 3. Write is sent to main memory. Reads will cache miss, retrieve new value from main memory 16

17 CS 152 Computer Architecture and Engineering Lecture Advanced CPUs John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L15: Superscalars and Scoreboards 17

18 Split pipelines: a write-after-write hazard. Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R1, R2, R3 SUB R1, R2, R3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. CS L9: Advanced Processors I The pipeline splits after the RF stage, feeding functional units with different latencies. UC Regents Fall 2008 UCB 18

19 IF (Fetch) ID (Decode) EX (ALU) MEM WB Superscalar R machine IR IR IR IR Instruction Issue Logic rs1 rs2 RegFile rd1 A op A L U 32 Y R 64 Data Instr Mem Addr 32 ws1 wd1 rs3 rs4 ws2 wd2 WE1 rd2 rd3 rd4 WE2 B A B op A L U 32 Y R PC and Sequencer IR IR IR IR IF (Fetch) ID (Decode) EX (ALU) CS L9: Advanced Processors I MEM WB UC Regents Fall 2008 UCB 19

20 CS 152 Computer Architecture and Engineering Lecture Networks, Routers, Google John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 20

6 key parameters scale across dimension of by one server, by 80-server rack and by array To get more DRAM and disk capacity, you must work on a scale larger than a single server.

21 6 key parameters scale across dimension of by one server, by 80-server rack and by array To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent. 21

22 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling I Thanks to Krste Asanovic... John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 22

23 Given an endless supply of registers... Rename architected registers (Ri, Fi) to new physical registers (PRi, PFi) on each write. Loop: LD F0,0(R1) ADDD SD What was gained? An instruction may execute once all of its source registers have been written. ADDI R1,R0,64 SUBI BNEZ NOP F4,F0,F2 0(R1),F4 F4,0(R1) R1,R1,8 R1,Loop CS 152 L18: Dynamic Scheduling I R1 PR01 F0 PF00!!! ADDI PR01,PR00,64!!! LD PF00 0(PR01)!!! ADDD PF04, PF00, PF02!!! SD PF04, 0(PR01)!!! SUBI PR11, PR01, 8!!! BEQZ PR11 ENDLOOP ITER2:!LD PF10 0(PR11)!!! ADDD PF14, PF10, PF02!!! SD PF14, 0(PR11)!!! SUBI PR21, PR11, 8!!! BEQZ PR21 ENDLOOP ITER3:!LD PF20 O(PR21)!!! [...] 23

24 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 24

Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in

25 Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in one clock cycle! Rename 2 Integer register rename Floatingpoint register rename For mis-speculation recovery Timestamped. Register numbers Instructions (4) Map Saved map state Map contentaddressable memories internal registers 80 in-flight instructions Reque Internal register numbers Input: 4 instructions specifying architected registers. Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued. 25

26 CS 152 Computer Architecture and Engineering Lecture Dynamic Scheduling III John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 26

27 Micro-op translation example... ADC m32, r32: // for a simple m32 address mode Becomes: LD T1 0(EBX); // EBX register point to m32 ADD T1, T1, CF; // CF is carry flag from EFLAGS ADD T1, T1, r32; // Add the specified register ST 0(EBX) T1; // Store result back to m32 Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles. CS 152 L20: Dynamic Scheduling III UC Regents Fall 2006 UCB 27

28 CS 152 Computer Architecture and Engineering Lecture Dataflow John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 28

29 Dataflow stages of Idea: Write dataflow programs that reference physical registers, to execute on this machine. Ebox Cluster 0 Input: Instructions that reference physical registers. Register scoreboard $. + 5 Execute -+ Reg File (80) Execute Ebox Cluster Media I Scoreboard: Tracks writes to physical registers. 29

30 CS 152 Computer Architecture and Engineering Lecture GPU + SIMD + Vectors I John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 30

31 Pure data move opcode. DEST X0 X0 X0 X0 X0 X0 m32 X0 X0 X0 Figure VBROADCASTSS Operation (VEX.256 encoded version) Or, part of a math opcode. X3 Y3 ADD X2 Y2 ADD X1 Y1 ADD X0 Y0 ADD Y2 + Y3 Y0 + Y1 X2 + X3 X0 + X1 Figure Horizontal Data Movement in PHADDD 31

32 CS 152 Computer Architecture and Engineering Lecture GPU + SIMD + Vectors II John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 32

33 Assume MacBook Air x 768 screen... We are all zoomed in on Google Maps Top pyramid image is 4K x 4K... Idea: Keep only a 1386 x 768 window of top images in RAM... Needed Clipped Clipped Eyepoint Near Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34.7 MB! Type and Size Full Mipmap 682KB 2.7MB 42.7MB 2.7GB 10923TB Clipmap 682KB 1.1MB 2.2MB 3.7MB 9.1MB Clipmap 682KB 2.7MB 6.7MB 12.7MB 34.7MB Clipmap 682KB 2.7MB 18.7MB 42.7MB 131.7MB 33

34 Zoom all the way in... units of pixels Map Levels (not to scale) Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. Portion of Source Image Covered by Map Level (one axis shown) clip map pyramid Clip-Map Pyramid Clip-Map Stack Map Levels units of sq. miles Clip-Map Center clip map stack Graphics hardware displays bottom stack image, which fills MacBook Air display. Viewer Position in Source Texture units of miles Hardware interpolation of stack levels. 34

35 CS 152 Computer Architecture and Engineering Lecture Voxel Processing John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 35

36 After processing... A 3-D matrix of cubes, in object space (X,Y,Z). 8-bit density value stored for each cube (0 = air ). 256^3 = 16 MB = 10 inch cube (for 1mm voxels) mm voxels? 8 GB Interesting to computer architects because n^3 grows so quickly! ;J l;lieiit : I I~ I I II i -[.!44~']4-f~.A. ;i I~ ~1 t ~ I <.,l,,,f.qdj4ff~-,ll& S i I; ~ i! b J ~ I"1 i i "dql.h'~>l~& I ~ I, I I I I ~ 1,~t,k[MLH'~ ::1 I ', I',I I/III m v ~p2" Smallest box Y shown is 16-cube OBJm-CT SPAC~ PARTITIONED AMONG 64 PROCESSORS 36

37 CS 152 Computer Architecture and Engineering Lecture Digital Imaging John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review 37

38 Camera interface to the outside world Simple Power Hookup Serial port to control the camera. Figure 4: 44-Ball icsp Assignment A B C D E F G 1 DGND DOUT4 DOUT5 DOUT7 DOUT LSB0 SADDR DGND 2 DOUT3 VDD DOUT6 VDDQ DOUT LSB1 VDD LINE VALID 3 DOUT2 DOUT1 DGNDQ DGNDQ FRAME VALID RESET# 4 VDDQ DOUT0 STAND BY VDDQ 5 CLK_IN PIXCLK DGNDQ DGNDQ OE# STROBE 6 SCLK VDD VDDQ VDDQ NC VAA NC 7 DGND DGNDQ SDATA 1 DGND VAAPIX NC AGND Two-Wire Serial Interface Master Clock Power-on Reset 1.7V 3.6V I/O Digital VDDQ SADDR SCLK SDATA CLKIN RESET# OE# STANDBY DGNDQ Digital GND 2.8V Core Digital VDD DGND 2.8V Analog VAA DOUT[7:0] FRAME_VALID LINE_VALID PIXCLK STROBE AGND VAAPIX Analog GND To CMOS Camera Port To Xenon or LED Flash Driver 8-bit Dout Port 54 MHz Clk 1280 x 15 fps 640 x 30 fps YCrCb 4:2:2 Top View (Ball Down) CS 250 L12: CMOS Imagers UC Regents Fall 2012 UCB 38

39 AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1.3 G-pixel 3 frames/sec 39

40 On Thursday Mid-term II... Ground rules... 40

41 Mid-term: How to do well... Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be you can only get it if do the reading problems... but the reading helps you understand how to think through the problem. CS 152 L16: Midterm I Review 41

42 Mid-term: There may be math... No memorization: If we ask about Amdahl s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. You may need to do: simple algebra and calculus, add a few numbers by hand, etc. Cannot use electronic devices... more administrative info after we do some content. CS 152 L16: Midterm I Review 42

43 When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday May 1st, 306 Soda. Every-other-seat seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils 10:55 AM, so we can collect papers before next class comes in. CS 152 L16: Midterm I Review 43

44 When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you ll still need to hand in your 10:55. Questions are reserved for serious concerns about a bug in the question. CS 152 L16: Midterm I Review 44

45 Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review 45

46 On Thursday Mid-term II... See you there! 46

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term