University of California, Berkeley College of Engineering

Similar documents
University of California, Berkeley College of Engineering

Department of Electrical Engineering and Computer Sciences Spring 2013 Instructor: Dr. Dan Garcia CS61C Midterm

University of California, Berkeley College of Engineering

University of California, Berkeley College of Engineering

University of California, Berkeley College of Engineering

Department of Electrical Engineering and Computer Sciences Spring 2007 Instructor: Dr. Dan Garcia

University of California, Berkeley College of Engineering

University of California, Berkeley College of Engineering

Department of Electrical Engineering and Computer Sciences Spring 2010 Instructor: Dr. Dan Garcia CS61C Midterm

University of California, Berkeley College of Engineering

University of California, Berkeley College of Engineering

University of California, Berkeley College of Engineering

CS61C Final After the exam, indicate on the line above where you fall in the emotion spectrum between sad & smiley...

University of California, Berkeley College of Engineering

Department of Electrical Engineering and Computer Science Spring 2004 Instructor: Dan Garcia CS61C Midterm

CS61C Midterm After the exam, indicate on the line above where you fall in the emotion spectrum between sad & smiley...

University of California, Berkeley College of Engineering

Midterm. CS64 Spring Midterm Exam

University of California, Berkeley College of Engineering

Computer Architecture I Midterm I

University of California, Berkeley College of Engineering

Computer Architecture I Midterm I (solutions)

CS61C Midterm After the exam, indicate on the line above where you fall in the emotion spectrum between sad & smiley...

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

CS61C Final Exam After the exam, indicate on the line above where you fall in the emotion spectrum between sad & smiley...

CS61C Summer 2013 Final Exam

University of California, Berkeley College of Engineering

M3) Got some numbers, down by the C (10 pts, 20 mins)

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS61C Sample Final. 1 Opening a New Branch. 2 Thread Level Parallelism

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Department of Electrical Engineering and Computer Sciences Fall 2016 Instructors: Randy Katz, Bernhard Boser

Department of Electrical Engineering and Computer Sciences Fall 2016 Instructors: Randy Katz, Bernhard Boser

Department of Electrical Engineering and Computer Sciences Fall 2003 Instructor: Dave Patterson CS 152 Exam #1. Personal Information

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

University of California, Berkeley College of Engineering

Guerrilla Session 3: MIPS CPU

Grading Results Total 100

Department of Electrical Engineering and Computer Sciences Fall 2017 Instructors: Randy Katz, Krste Asanovic CS61C MIDTERM 2

University of California, Berkeley College of Engineering

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CS61c Summer 2014 Final Exam

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Problem maximum score 1 35pts 2 22pts 3 23pts 4 15pts Total 95pts

CS61C : Machine Structures

CS61C Summer 2013 Midterm

Computer Architecture I Mid-Term I

COMP303 Computer Architecture Lecture 9. Single Cycle Control

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

CS 61C, Fall, 2005, Midterm 1, Garcia Question 1: For those about to test, we salute you (11 pts, 20 min.)

CS61C : Machine Structures

CS61C Summer 2012 Final

University of California, Berkeley College of Engineering

UC Berkeley CS61C : Machine Structures

Fall 2010 CS61C Midterm 1 Solutions

University of California, Berkeley College of Engineering

CS61C Summer 2013 Final Review Session

CS 351 Exam 2 Mon. 11/2/2015

CS61c Summer 2014 Final Exam

CENG 3420 Lecture 06: Datapath

CS61C : Machine Structures

CS61C : Machine Structures

Lecture #17: CPU Design II Control

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

F2) Don t let me fault (22 pts, 30 min)

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CS 61c: Great Ideas in Computer Architecture

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

CS61c Summer 2014 Midterm Exam

Review of Last Lecture. CS 61C: Great Ideas in Computer Architecture. MIPS Instruction Representation II. Agenda. Dealing With Large Immediates

MIPS-Lite Single-Cycle Control

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CSE 378 Midterm 2/12/10 Sample Solution

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 3 Solutions

The Processor: Datapath & Control

Nick Weaver Sp 2019 Great Ideas in Computer Architecture MT 1. Print your name:,

CS61C : Machine Structures

CS61c Summer 2014 Midterm Exam

CDA3100 Midterm Exam, Summer 2013

Final Exam Spring 2017

CS61C Midterm 2 Review Session. Problems Only

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Wawrzynek & Weaver CS 61C. Sp 2018 Great Ideas in Computer Architecture MT 1. Print your name:,

MIPS Functions and Instruction Formats

How to design a controller to produce signals to control the datapath

CSEN 601: Computer System Architecture Summer 2014

Machine Organization & Assembly Language

Winter 2006 FINAL EXAMINATION Auxiliary Gymnasium Tuesday, April 18 7:00pm to 10:00pm

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

CS61c MIDTERM EXAM: 3/17/99

UC Berkeley CS61C : Machine Structures

Lecture 5: Procedure Calls

UCB CS61C : Machine Structures

Transcription:

University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Spring 2014 Instructor: Dan Garcia 2014-05-13! " After the exam, indicate on the line above where you fall in the emotion spectrum between sad smiley... Last Name First Name Student ID Number Login Login First Letter (please circle) Login Second Letter (please circle) The name of your SECTION TA (please circle) Name of the person to your Left Name of the person to your Right All the work is my own. I had no prior knowledge of the exam contents nor will I share the contents with others in CS61C who have not taken it yet. (please sign) cs61c- a b c d e f g h i j k m o p a b c d e f g h i j k l m n o p q r s t u v w x y z Alan Jeffrey Kevin Roger Sagar Shreyas Sung Roa William Instructions (Read Me!) This booklet contains 9 numbered pages including the cover page. Put all answers on these pages; don t hand in any stray pieces of paper. Please turn off all pagers, cell phones beepers. Remove all hats headphones. Place your backpacks, laptops and jackets at the front. Nothing may be placed in the no fly zone spare seat/desk between students. You have 180 minutes to complete this exam. The exam is closed book, no computers, PDAs or calculators. You may use two pages (US Letter, front and back) of notes and the green sheet. There may be partial credit for incomplete answers; write as much of the solution as you can. We will deduct points if your solution is far more complicated than necessary. When we provide a blank, please fit your answer within the space provided. IEC format refers to the mebi, tebi, etc prefixes. You must complete ALL THE QUESTIONS, regardless of your score on the midterm. Clobbering only works from the Final to the Midterm, not vice versa. You have 3 hours... relax. Question M1 M2 M3 Ms F1 F2 F3 F4 Fs Total Minutes 20 20 20 60 30 30 30 30 120 180 Points 10 10 10 30 22 23 22 23 90 120 Score

M1) What s that funky smell? Oh, it s Potpourri (10 pts, 20 min) (ThisisforM1a,M1b)AvariantoftheMIPSinstructionsethasbeendevelopedonthenew64@bitsystemforcloud computing.toemphasizethe big inbigdata,thesizeoftheopcodefieldhasbeenincreasedbyonebit,andthe number of registers (whose width is now 64@bits) was quadrupled. Assume that instructions expand the rightmost field to fill all64bits.sohere,thefunct field would expand; in an I@type instruction, the Immediate wouldexpand.you0may0use0the0boxes0below0as0scratch0space. 0 Opcode0 Rs0 Rt0 Rd0 Shamt0 Funct0 a) Howmanytotalinstructionscanwehave?Pleaseleaveyouranswerasanexpressioninvolvingpowersof2. b) WhatisthemaximumamountofbyteswecouldchangethePCbywithasinglebranchinstruction? (Express0your0answer0in0IEC0format,0e.g.,0320Kibi,0160Mebi,0etc.)0 typedef struct { int32_t len; double *data; vector; double min = -0.8, max = 1.7; vector *filter(char *filename) { char *extension =.txt ; int32_t i = 0; vector *output = malloc(sizeof(vector));... //! here c) Assumeourmemoryis32@bitaddressed.How0many0bytes0are0allocated0in0each0section0of0memory0as0a0result0of0 these0lines0(up0to0! here )?IncludeallocationsONLYfromthelinesshown;assumeregistersarenotused. Stack: Heap: Static: d) Completethefollowingcodesoitobeysitscomments: typedef struct list { int val; struct list *next; list; /* Swaps values at the front of lists x and y */ void swap_heads(list *x, list *y) { list tmp = {y->val, x; // new list node x = tmp; y->val = x->next->val; beforethecallto swap_heads: pi 2 1 4 e 3 7 1 andafterthe swap_heads(pi,e) call (notetheheadvaluesswapped): 3 1 4 e 2 7 1 e) Whatistheratioof the#ofnumbersadoublecanencodebetween0.5and1to the#ofnumbersadoublecanencodebetween5and7? (Express0your0answer0in0IEC0format,0e.g.,0320Kibi,0160Mebi,0etc.)0 pi 2/9

M2) Cache, money y all (10 pts, 20 min) We#have#a#standard#32@bit$byte@addressed"MIPS"machine"with"4"GiB"RAM,"a"4@way$set@associative)CPU)data)cache) that$uses$32$byte$blocks,$a$lru$replacement$policy,$and$has$a$total$capacity'of'16'kib."consider"the"following"c"code" and$answer$the$questions$below. #define SIZE_OF_A 2048 typedef struct { int x; int y[3]; node; int count_x(node *A, int x) { int k = 0; for (int i = 0; i < SIZE_OF_A; i++) if (A[i].x == x) { k++; return k; a) How$many$bits$are$used$for$the$tag,$index,$and$offset? Tag Index Offset b) We#call#count_x!with%all%values%of%x!from%0%to%65535%to%count%the%number%of!timesthat each%x!occursina."the"value"of"a!is#the#same#in#every#call."the!cache%is%cold%at%the% beginningofexecution.whatisthecache"hit"rate?"" Questions)(c))and)(d))below)are)two)independent!variations)on)the)original)problem. c) Let'ssaythatweincreaseourCPUcacheassociativityto8@way.% What%is%our!cache"hit$rate$now?$ d) WhatwouldbetheapproximatecacherateifwechangedourCPUcachetousea Most0Recently0Used(MRU)cachereplacementpolicy,andwechangethecachetobe fully0associative?! 3/9

M3) MIPS stands for MIPS is pretty sweet! (10 pts, 20 min) # a0 is a pointer to an array of ints # a1 is the length of the array Use this area for your own notes 1 mystery: la $t2, loop (Assume this takes only 1 TAL instruction) 2 loop: addiu $t9, $a0, 0 3 lw $t0, 0($t9) 4 addiu $t0, $t0, 1 5 sw $t0, 0($t9) 6 lw $t1, 0($t2) 7 addiu $t1, $t1, 4 8 sw $t1, 0($t2) 9 addiu $a1, $a1, -1 10 bne $a1, $0, loop 11 jr $ra a) At#a#functional#level,#what$does$this$code$do!(for% small!array#arguments),!calledforthefirsttime? b) Approximately-what$is$the$most%array$elements'this%code%can%handle%correctly!(called'for'the'first'time)?! (Express(your(answer(in(IEC(format,(e.g.,(32(Kibi,!16#Mebi,#etc.) c) What%would%happen%if!we#exceed!this%value!by#one#array#element#(called'for'the'first'time)?" (Mention(a(line%number,%an#instruction)field,"and"what"happens"when"the"code"is"run"to"completion!in#your#answer)0 d) You llnoticethatwekeepsaying calledforthefirsttime,'because'this!code%can%only%be%called%once!!!specify( three%mips%lines%you%could%put%after%line%10!(say%as#10.1,"10.2,"10.3)"that$would$allow$this$code$to$be$reused.!! (This%code%should%even$work%if#the#array#size#is#exceeded,#and$it$didn t!crashafter!what!you$described!in#(c)!happened) 10.1 10.2 10.3 4/9

F1) Madonna revisited: We Are Living in a Digital World (22 pts, 30 mins) a) Rewritethefollowingcircuit usingtheminimumnumber ofand,or,andnotgates: You0must0show0your0work0above0to0earn0points. b) Completethefollowingfinitestatemachinewhoseinputtakeson thevalueof X, Y,or Z.Themachineshouldoutputa1onlyifit hasseeneitheranoddnumberofxsoranoddnumberofys. Assumeyou veseennoxsorysatthestartstate.otherwise,the machineshouldoutputa0.addnonewstates.thelabelingof thestatesas(00,01,10,11)isarbitrary.we veaddedthefirst transitionarrowforyou,whichistakenifyou veseenanx. c) Considerthefollowingcircuitbelow.Assuming(1)thesignals allstartat0,(2)thexorgatehasnopropagationdelay,and (3)theregistersetup,hold,anddelayareallgivenbythesmall approximatelengthoft(about1/10 th theclockperiod),fill outthefollowingtimingdiagram. X/1 5/9

F2) V(I/O)rtual Potpourri (23 pts, 30 mins) Forthefollowingquestions,assumethefollowing: 16@bitvirtualaddresses 1KiBpages 512KiBofphysicalmemorywithLRUpagereplacementpolicy FullyassociativeTLBwith16entriesandanLRUreplacementpolicy a) Howmanyvirtualpagesarethereperprocess? b) Howmanybitswideisthepage0table0base0register? Forquestions(c)and(d),assumeonlythecodeandthetwoarraystakeupmemory,thearraysaredistinct,ALLof codefitsin1page,thearraysarepage@aligned(startonpageboundary),andthisistheonlyprocessrunning. char *mystrcpy(char *dst, char *src) { char *ptr = dst; while(*dst++ = *src++); return ptr; c) IfmystrcpywerecalledwithacharacterstringoflengthS, howmanypagefaultscanoccurintheworst@casescenario? d) Inthebest@casescenario,howmanyiterationsoftheloopcanoccurbeforeaTLBmiss? Youcanleaveyouranswerasaproductoftwonumbers. Forthenextthreestatements,circleTrueorFalse: e) [True/False]RAIDwasinventedatCalasawaytodecreasethenumberofdiskfailuresinasystem. f) [True/False]RAID1isthemostexpensiveRAIDconfigurationbutitoffersveryhighavailability. g) [True/False]WritingtodiskonRAID1isfasterthanonRAID5. 6/9

F3) Datapathology (22 pts, 30 mins) Considerthesinglecycledatapathas itrelatestoanewmipsinstruction, storeshiftleftlogical: ssll rd, rs, rt, shamt Theinstructiondoesthe following: 1) Readsthevalueofregister rtandshiftsitleftby shamtbits. 2) Storestheresultinto memorylocationr[rs]as! well!asregisterrd. ALUOut DataMemOut Ignore!pipelining!for!(a)4(c).! a) WritetheRegisterTransferLanguage(RTL)correspondingtossll rd, rs, rt, shamt b)changeaslittle0as0possibleinthedatapathabove(draw!your!changes!right!in!the!figure)toenablessll. Listallyourchangesbelow.Yourmodificationmayusemuxes,wires,constants,andnewcontrolsignals,but nothingelse.(youmaynotneedalltheprovidedboxes.) (i).! (ii)! (iii) c)wenowwanttosetallthecontrollinesappropriately.listwhateachsignalshouldbe,eitherbyanintuitive nameor{0,1, don tcare.includeanynewcontrolsignalsyouadded. RegDst RegWr npc_sel ExtOp ALUSrc ALUctr MemWr MemtoReg!!!!!!!!!! Forquestions(d)@(f),Assumethatyouhavea5@stagepipelinewithno BRANCH: ssll $t0, $t1, $t2, 5 forwarding,nointerlock,andnobranchdelayslots.readthecodebelow lw $t3, 0($t1) andanswerthefollowingquestions. addi $t5, $t1, 11 beq $t3, $t1, BRANCH d) Whatkind(s)ofhazardsexistintheselinesofcode? e) Howmanystallsareneededtoresolvethem? f) Ifyouweretohavebranchdelayslotsbutnoforwardingnorinterlock, whatistheoptimalnumberofcyclesoneiterationofthiscodewouldtake? 7/9

F4) What do you call two L s that go together? (22 pts, 30 mins) We0will0drop0the0lowest0score0of0parts0(a),0(b),0(c)0and0(d)0below,0each0are0worth040points.0 We vedonealotofworkthroughoutthesemesterwithdotproducts(alsocalledinnerproducts),let sswitch thingsupandtakesomeouterproducts.wherethedotproductbetweentwocolumnvectorsisdefinedtobe!!!!,0 theouterproductbetweentwoisdefinedtobe!!!.ifwelet! =!!!!,thenit sstraightforwardtoseethat!!"! =!!!!!.Ourgoalinthisproblemisgoingtobetotrytoparallelizethiscomputationinacoupleofdifferent ways,startingfromthefollowingsourcecode: void outer_product(float* dst, float *x, float *y, size_t n) { for (size_t i = 0; i < n; i += 1) for (size_t j = 0; j < n; j += 1) dst[i*n + j] = x[i] * y[j]; a) UsingtheopenMPdirectiveswelearnedinlabandlecture,parallelizeouter_product().Youshouldoptimize performancewhilestillguaranteeingcorrectness.youmaynotneedeveryblank. void outer_product(float* dst, float *x, float *y, size_t n) { for (size_t i = 0; i < n; i += 1) for (size_t j = 0; j < n; j += 1) dst[i*n + j] = x[i] * y[j]; b) NowuseSSEintrinsicstooptimizeouter_product().Youmayfindthefollowinguseful: _mm_loadu_ps( m128 *src) _mm_storeu_ps( m128 *dst, m128 val) _mm_load1_ps(float *src) _mm_mul_ps( m128 a, m128 b) Youmayassumethatnisamultipleof4forthispartoftheproblem. void outer_product(float* dst, float *x, float *y, size_t n) { for (size_t i = 0; i < n; i += 1) for (size_t j = 0; j < n; j += 1) mm_storeu_ps(dst[ i*n + j ], products); j += 3; 8/9

F4) What do you call two L s that go together? (Continued) c)nowwriteacudakerneltocomputeouterproducts global void outer_product_kernel(float *dst, float *x, float *y, size_t n) { size_t j = threadidx.x+blockdim.x*blockidx.x; size_t i = threadidx.y+blockdim.y*blockidx.y; if ( ) { d) Finally,we regoingtousemapreducetogenerateouterproducts.we regoingtoassumetheexistenceofa TupledatatypethathasaconstructortakingintwoLongs(orLongWritables),andcreatesapairfromthetwo ofthem.alsoassumethatvectorsarerepresentedasacollectionofkvpairsmappingfromscalarindicesto values,whilematricesarerepresentedaskvpairsmappingtupleindicestovalues.completemapsothatit willworkcorrectlywithreduce.youmaycallgetn()atanytimetograbthelengthofthevectorsbeing processed. void Map(LongWritable idx, DoubleWritable val) { for ( long z = 0 ; ; ) { if (is_x_vector()) // magically detects if we re currently processing a value from x else context.write( ); context.write( ); void Reduce(Tuple idx, Iterable<DoubleWritable> vals) { context.write(idx, vals.next() * vals.next()); e) Whichoftheseapproacheswouldyouexpecttooperatethemostefficientlyonmediumsizedinputs(order 1000elementsineachvector)?Assumethatwe reusinghivemachines,oranamazon clusterliketheone usedinproject2inthecaseofmapreduce.giveabrief1@2sentenceexplanationofwhytheotherapproaches wouldfareworse. f) Whichoftheseapproacheswouldscaletothelargestinputbeforecomputingincorrectanswers,orfailingto computeanswers?whichwouldfailonthesmallestinput?explainin1@2sentences. 9/9