Review of Basic. Computer Architecture. Theory Goals Specification

Size: px
Start display at page:

Download "Review of Basic. Computer Architecture. Theory Goals Specification"

Transcription

1 Computer Archtecture What s Computer Archtecture of Basc Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp between dfferent hardware components of a computer system. It may also refer to the practcal art of defnng the structure and relatonshp of the subcomponents of a computer. Ths artcle needs attenton from an expert n computer scence. 2 Computer Archtecture What s Computer Archtecture Defnton 2.0 In computer scence and engneerng, computer archtecture refers to the study of performance n computer systems. It also refers to the practcal scence of applyng performance theory to specfyng the structure and relatonshp of the subcomponents of a computer. From an expert n computer scence. Theory Goals Specfcaton Theory Computaton components CPU: ALU + memory + control s Performance run-tme speed Run-tme of what? Compared to what? Requrements Word processng Number crunchng Gamng Web server Real tme von Neumann Archtecture nput memory output controller Arthmetc Logc Unt (ALU) data/nstructon path control path Specfcaton Requrements + performance theory component mplementaton 3 4

2 5 Requrements Relatve to Applcaton Fastest CPU Intel Xeon Hgh-end verson of Intel x86-64 processor famly IA-32 nstructon set P6 mcro-archtecture + enhancements (Netburst Ivy Brdge) Pentum II Pentum III Pentum 4 Multcore 2 faster than best compettor Fastest supercomputer IBM Sequoa BlueGene/Q CPU Power BQC 6C.60 GHz 98,304 compute nodes,572,864 processor cores.6 PB memory.6 Mega GB (638 TB) Energy effcent 3000 Mflops/watt /3 of best compettor Smartphones ARM CPU Low power Hgher performance / Watt than x86 Fundamental Archtectural Abstractons Dgtal computer Machne that can be programmed to process symbols Symbol wth no ntrnsc meanng to machne User mposes meanng Integer, float, strng,... Operaton Symbol descrbng processng of data symbols Machne nterprets meanng transfer, ALU, control, OS,... Symbol descrbng operaton on data Machne language collecton of legal nstructons Addressng Mode Specfes data locaton as operand Source operand data nput to operaton Destnaton operand data output from operaton 6 Stages n Computer Desgn Typcal Operatons Set Archtecture (ISA). Defne unverse of problems to be solved 2. Study canddate operatons at level of system programmer Atomc operatons complete sequentally General operaton combnaton of atomc operatons 3. Specfy nstructon set for machne language Choose mnmum set of orthogonal operatons Not too many ways to solve same problem Implementaton. Desgn machne as mplementaton of ISA 2. Evaluate theoretcal performance 3. Identfy performance problem areas 4. Improve processor effcency transfer Load (r m), store (m r), move (r/m r/m), convert data types Arthmetc/Logcal (ALU) Integer arthmetc (+ compare shft) and logcal (AND, OR, NOR, XOR) Decmal Integer arthmetc on decmal numbers Floatng pont (FPU) Floatng pont arthmetc (+ sqrt trg exp ) Strng Strng move, strng compare, strng search Control Condtonal and uncondtonal branch, call/return, trap Operatng System System calls, vrtual memory management nstructons Graphcs Pxel operatons, compresson/decompresson operatons 7 8

3 9 Herarchy CPU and Herarchy locatons outsde CPU and RAM Stores data and nstructons of "all" programs Organzed by OS locaton outsde CPU Stores "all" data and nstructons of runnng programs Organzed by addresses locaton n or near CPU Fast access to mportant data and nstructons from RAM Copy of RAM secton locaton nsde CPU Fast access to small amount of nformaton Organzed by CPU CPU controller accesses L cache f (L cache ht) {access performed n clock cycle} else { L cache mss L cache accesses cache controller cache controller ntates access to L2 and man memory f (address n L2 cache) {controller copes contents to L from L2} else {controller copes locaton to L from man memory} } CPU Long Term Storage All Fles and Man (RAM) Runnng Programs and Cache Next Few s and Regster Current ALU Regsters access n CC L nstructons L data request update cache controller L2 I/O Dsk access latency >> clock cycle Man Cache mss penalty Address not n L delay n memory access 0 Specfyng Operands Addressng Modes Immedate Constant IMM numercal value coded nto nstructon Regster operands regster name a CPU storage locaton REGS[regster name] data stored n regster REGS[R3] data stored n regster R operands address a memory storage locaton 45 MEM[address] data stored n memory MEM[223344] data stored at address Effectve Address (EA) ponter arthmetc REGS[R3] &(varable) MEM[REGS[R3]+4] *(&(varable)+4) *(REGS[R3]+4) *( ) 45 R3 Mode Syntax Access Use Regster R3 Regs[R3] Regster data Immedate #3 3 Constant Drect (absolute) Regster deferred (00) Mem[00] Statc data (R) Mem[Regs[R]] Ponter Dsplacement 00(R) Mem[00+Regs[R]] Local varable Indexed (R + R2) Mem[Regs[R]+Regs[R2]] Array addressng ndrect Auto Increment Auto Mem[Mem[Regs[R3]]] Ponter to ponter (R2)+ -(R2) Mem[Regs[R2]] Regs[R2] Regs[R2]+d Regs[R2] Regs[R2]-d Mem[Regs[R2]] Stack access Stack access Scaled 00(R2)[R3] Mem[00+Regs[R2]+Regs[R3]*d] Indexng arrays PC-relatve (PC) Mem[PC+value] PC-relatve deferred 00(PC) Mem[PC+Mem[00]] Load nstructon to data regster Load nstructon to data regster 2

4 3 Commtment to State Internal regsters Temporary regsters used n executng machne nstructons Not vsble to programs Archtectural state CPU regsters vsble to programs System state All data resources vsble to programs Archtectural state + system memory Commtment to state Update of system state Wrte to archtectural state / system memory Complex Set Computer (CISC) Classc Machne Desgn 300 nstructon types 5 addressng modes 0 data types Complex machne mplementatons Manframes ( ) Large, expensve, centralzed computers for bg busness and government Manufacturers: IBM, Control, Burrows, Honeywell Mncomputers ( ) Smaller computers for smaller organzatons Manufacturers: Dgtal (PDP/VAX), General (Eclpse) CISC mcroprocessors ( ) 6800 (974) and 8086 (978) desgned as tny CISC on chp Apple II (977) 6502 (975) IBM PC (98) 8088 (979) Intel x86 for PC/Mac last CISC ISA stll manufactured. 4 Why CISC? Semantc Gap Argument Computer language should mtate natural language Large vocabulary + hgh redundancy flexblty + power Physcal Implementaton of CISC Generc Machne ALU Subsystem Terrble complers Lmted optmzaton Lmted error messagng Effcent code wrtten or optmzed n assembly language Regsters ALU Operaton ALU Result Flag IN 2 3 OUT Expensve memory RAM ~ $5000/MB wholesale n 977 RAM ~ $0.0/MB n 202 System Bus Implcatons for machne language Desgn for user-frendly programmng and small memory use Many hghly specfc nstructons usng many addressng modes Compact nstructon codes that perform a lot of work Status Word Decoder PC - program counter IR - nstructon regster IR PC + MAR - memory address regster MDR - memory data regster control MAR MDR Address Man 5 6

5 7 Decodng Machne s Machne Language SUB R, R2, 00(R3) Mcrocode Sequence (Mcroprogram) ALU_IN R3 ALU 00 ADD MAR OUT READ ALU_IN MDR ALU R2 SUB R OUT Mcrocode nstructon Hardware level atomc operaton 9 lnes 9 clock cycles Regsters Status Word Decoder PC - program counter IR - nstructon regster System Bus IR ALU Operaton ALU Result Flag PC IN + MAR - memory address regster MDR - memory data regster ALU Subsystem 3 2 control OUT MAR MDR Address Man Run Tme and Clock Cycles CPU s tmed by perodc sgnal called clock (CLK) clock cycle Clock Cycle (CC) tme seconds per cycle requres or more clock cycles to process Clock Rate cycles per second Hz (Hertz) Run tme clock cycles to run program seconds per clock cycles clock cycles to run program clock cycles per second Hgher clock rate shorter run tme More clock cycles (at constant clock rate) longer run tme 8 Intel 386 Mcroprocessor Basc Performance Measures Run Tme Elapsed tme T from start to fnsh of a defned program task Latency Excess response tme depends on context Throughput Number of defned tasks performed per unt tme Throughput T + latency between tasks Enhancement Change to system new run tme T ' Speedup T S S > T' < T T ' 9 20

6 2 Defntons T t IC N τ total run tme of program total run tme of nstructons n group number of nstructons n group ( Count) number of clock cycles to run nstructon n group ( C ycles Per ) R clock rate clock frequency clock cycles per second Hertz (Hz) τ IC N number of clock cycles to run all nstructons n group seconds per clock cycle total number of nstructons n program total number of clock cycles to run program quantty ' average number of clock cycles per nstructon for the program new value of quantty after archtectural change CPU Equaton Clock cycles to run all nstructons of type clock cycles nstructon of type N nstructons of type IC Total clock cycles to run all nstructons n program N N IC all groups Average number of clock cycles per nstructon for program total number of clock cycles to run program N total number of nstructons n program IC IC N IC IC IC IC Rato IC IC s proporton (percent) of nstructons n group weghted average IC IC 22 CPU Run Tme Run tme of one nstructon of type clock cycles seconds nstructon of type clock cycle Run tme for all nstructons of type t nstructons of type IC τ Total run tme for program τ clock cycles seconds nstructon of type clock cycle IC τ T t IC τ IC all groups IC So T IC τ clock cycles per nstructon number of nstructons clock cycle Amdahl Equaton t F relatve run tme of nstructons n group T t S speedup for nstructons n group t ' t t FT F T S T' t ' t F F F T S S S S Enhancement to group e S F F F + Fe + S S S e e e e e e Amdahl's "Law" Speedup lmted by F e Enhance maxmum F e Accept mparment to small F e 23 24

7 25 Amdahl Equaton n Parallel Processng F P n processors n P n n + P ( work can be parallelzed) ( work cannot be parallelzed) n n FP + ( FP) n Fracton of processng that can be performed ndependently n Number of processng unts n n IC τ S n processors n processors IC τ F P + n ( F ) P SPEC Benchmark Programs for system performance measurement + comparson Standard + repeatable Test system for realstc condtons Summary score for easy comparson Results posted at Specfc test sutes CINT CPU nteger nstructons CFP CPU FP nstructons Performance as fle server, web server, mal server Graphcs Other advanced features Updated every few years to reflect realstc condtons Based on current statstcal dstrbutons of computng tasks Current CPU test verson 2006 Reports speedup Run tme compared wth a standard machne 26 How SPEC Works User runs n programs on test machne Records run-tme condtons test T,,2,..., n Records program run-tme n seconds SPEC provdes run-tmes on reference machne Sun Ultra Enterprse 2 ref T 296 MHz UltraSPARC II processor Was powerful Unx workstaton n 997 User calculates speedup for each program ref T S test,, 2,..., n T User calculates geometrc mean of speedups n T S ( test machne on ref) T ref test S ( machne A compared to machne B) n S ( machne A on ref) S ( machne B on ref) Typcal SPEC Report Base standard confguraton SPEC(R) CINT2006 Summary Sun Mcrosystems Sun SPARC Enterprse M8000 Wed Mar 2 22:23: CPU2006 Lcense #6 Test sponsor: Sun Mcrosystems Tester: Fujtsu Lmted Test date: Mar-2007 Hardware aval: Apr-2007 Software aval: May-2007 Base Base Base Peak Peak Peak Benchmarks Ref. Run Tme Rato Ref. Run Tme Rato perlbench * * 40.bzp * * 403.gcc * * 429.mcf * * 445.gobmk * * 456.hmmer * * 458.sjeng * * 462.lbquantum * * 464.h264ref * * 47.omnetpp * * 473.astar * * 483.xalancbmk * * SPECnt(R)_base SPECnt Peak specalst confguraton 27 28

8 29 Typcal SPEC Report 2 HARDWARE CPU Name: SPARC64 VI CPU Characterstcs: CPU MHz: 2280 FPU: Integrated CPU(s) enabled: 32 cores, 6 chps, 2 cores/chp, 2 threads/core CPU(s) orderable: to 4 CMUs; each CMU contans 2 or 4 chps Prmary Cache: 28 KB I + 28 KB D on chp per core Secondary Cache: 5 MB I+D on chp per chp L3 Cache: None Other Cache: None : 64 GB (64 x GB, see notes for detals) Dsk Subsystem: 73 GB 0,000 RPM Fujtsu MAY2073RC SAS Other Hardware: None SOFTWARE Operatng System: Solars 0 /06 Compler: Sun Studo 2 (Early Access) Auto Parallel: No Fle System: ufs System State: Default Base Ponters: 32-bt Peak Ponters: 32-bt Other Software: None Representatve Cnt Results Sponsor Processor Clock (GHz) Auto Parallel Total Chps Total Cores Total Threads Hypertechnologes Intel Core X 4.5 Yes Supermcro Intel Core K 4.4 Yes NEC Intel Xeon E Yes Huawe Intel Xeon E Yes Supermcro Intel Core Yes Dell Intel Xeon E Yes Intel Intel Core 2 Duo E Yes Intel Intel Core 2 Duo E No Dell Pentum No.5 Intel Intel Pentum M No 0.7 Base 30 Representatve Cfp Results Sponsor Processor Clock (GHz) Auto Parallel Total Chps Total Cores Total Threads HPE Intel Xeon E Yes Hypertechnologes Intel Core X 4.5 Yes HPE Intel Xeon E Yes Dell Intel Xeon E Yes Supermcro Intel Core K 4.4 Yes Supermcro Intel Core Yes Intel Intel Core 2 Duo E Yes Intel Intel Core 2 Duo E No Dell Pentum No 2.2 Base Benchmarkng a Processor Desgn Specfy Set Archtecture (ISA) Specfes machne language for proposed CPU Provdes human-readable assembly language Determnes for each nstructon group Count clock cycles requred to mplement each nstructon n ISA Wrte complers for proposed machne language C, C++, Fortran Comple benchmark programs to machne language Programs from SPEC CINT and CFP Analyze compler output (executable programs) Sort machne nstructons nto groups Calculate relatve nstructon count IC /IC for each group Calculate average and overall run tme T Compare run tme wth reference machne 3 32

9 33 CISC Creates Ant CISC Revoluton General ntroduces Eclpse 32-bt CISC mncomputer Dgtal (DEC) ntroduces VAX 32-bt CISC mncomputer Frst serous nexpensve competton to manframe computers Serous computers became avalable to small organzatons UNIX developed as mncomputer operatng system TCP/IP developed to support networks of mncomputers Computer Scence emerged as separate academc dscplne Students needed topcs for projects, theses, dssertatons Research results on mncomputer performance CISC uses machne resources neffcently Most machne nstructons are rarely used n programs CISC machnes run slowly to support unnecessary features RISC "Phlosophy" Technologcal developments from 975 to 990 Prce of RAM drops from $5000 / MByte (975) to $5 / MByte (990) Complers become powerful and effcent wth extensve optmzaton Portable code made practcal by mncomputer, Unx, C, and TCP/IP Prncpal research results on CISC performance ~ 90% of run tme devoted to ~ 0% of nstructon set ~ 90% of nstructons n ISA rarely used Reduced Set Computer (RISC) Apply Amdahl's "Law" CISC ISA Speed up operatons accountng for most of run tme Ignore mparments to other nstructons RISC ISA only most mportant CISC nstructons Other CISC nstructons multple RISC nstructons RISC mplementaton executes ts ISA n fast dedcated hardware 34 Types Representatve nstructon dstrbuton Fve programs from SPECnt92 benchmark sute Comple for x86 nstructon set (ISA for Intel 386/486/Pentum) Relatve Proporton of Total Run Tme Load 22% Condtonal branch 20% Compare 6% Store 2% Add 8% And 6% Sub 5% Move reg-reg 4% Call % Return % Other 5% Total 00% Frst 0 nstructons account for 95% of run tme Amdahl's "Law" Fast mplementaton of 95% Other 5% wll not serously degrade performance Must nclude uncondtonal branch for completeness RISC Mcroprocessors Smpler ISA Small set of unform length machne nstructons Smpler hardware No mcrocode standard nstructon mplementaton No central system bus CPU process several nstructons at once Lower + hgher clock speed completes on (almost) every clock cycle All processors today use RISC technology Pure RISC (PowerPC, Sparc, MIPS, ARM, ) RISC technology for CISC language (Pentum II 4, Centrno, Core) Explctly parallel RISC (Intel Itanum, IBM manframes) Ref: Hennessy / Patterson, fgure

10 37 Typcal RISC ISA types 32-bt / 64-bt nteger and floatng pont Flat memory model wth 32-bt / 64-bt address Address mode: dsp(rn) ~ Mem[Regs[Rn] + dsp] Regster-regster operaton model nteger regsters FP regsters OS (kernel mode) regsters Result flags Read-only (value 0) and wrte-only (null) regsters types Load, store, move regster-regster Integer add, sub, mult, dv, shft, compare Boolean and, or, xor Floatng pont add, sub, mult, dv, sqrt, compare Jump, jump regster, jump and lnk, condtonal branch Typcal Encodng types for Alpha 64-bt RISC processor Opcode Number PALcode type Opcode Ra Dsp Branch type Opcode Ra Rb Dsp type Opcode Ra Rb Functon Rc Operate type Opcode (6 bts) dentfes operaton to CPU Ra, Rb (5 bts) dentfy regster names (R0 to R32) PALcode (Prvleged Archtecture Lbrary) hardware support for OS Branch test Ra, true Ra PC, PC PC + Dsp move between Ra and Mem[Regs[Rb] + Dsp] Operate R/R Rc Ra functon Rb (regster name) Operate R/I Rc Ra functon Imm (n Rb and 3 bts of functon) 38 Smple RISC Physcal Implementaton Stage Stage 2 Stage 3 Stage 4 Fetch Decode Execute Access Wrte Back Ppelnng The RISC Advantage Level Parallelsm (ILP) Hardware starts second nstructon before frst completes Typcally 4 nstructons n varous stages of executon at one tme Stage Stage 2 Stage 3 Stage 4 Address Address Early PowerPC mplementaton Wrte Fetch Decode Execute Access Address Address Wrte Back No system bus nstructons proceed from left to rght (assembly lne) Wrte Separate cache memory for nstructons and data Smple repettve operatons CC Stage Stage 2 Stage 3 Stage 4. Fetch unform-length nstructons 2. decode read source operands from regsters 3. Execute ALU nstructons and calculate addresses 4. Access memory and/or wrte destnaton operands (commt to state) I I 5 I I I One CC per stage per nstructon 4 clock cycles per nstructon 6 I 6 I

11 4 Orented Vew s N I W I 5 I 6 Clock Cycles IC+ (ppelne length ) deal N IC IC + ( ppelne length ) ppelne length + IC large IC IC nstructons T IC τ IC τ IC large clock rate deal deal W W W Fetch Decode Execute Wrte Ppelne Imbalance Stage Stage 2 Stage 3 Stage 4 Fetch Decode Execute Access Address Address executes n 4 clock cycles Clock cycle tme determned by LOAD nstructon Longest executon tme τ τ τ τ τ τ τ >τ +τ 2 τ Most nstructons do not access data memory n stage 4 Only LOAD and STORE access data memory Only LOAD performs both memory access and regster wrte-back Most operatons can complete n tme Wrte Back fetch decode execute memory access regster wrte-back mnmum clock cycle memory access regster wrte-back mnmum τ mnmum Wrte 42 Superppelnng Stage Fetch Stage 2 Decode Stage 3 Execute Dvde stage 4 nto two stages Only load/store do useful work n MEM Stage Dvde clock cycle tme (double clock rate) Stage 4 MEM Access Address Address ττ ' τ/2 mnmum IC τ IC τ S 2 ' IC ' τ ' IC τ/2 Programs can run twce as fast I 5 F Stage 5 WB Wrte Back I F D E M W deal D F E D F M E D F W M E D deal ' W M E Ppelne Hazards dependences Result of one nstructon s source for later nstructon Hazard condton Processor runs unnterrupted but provdes ncorrect answers Ppelne hazard Several nstructons n varous stages of executon Ppelne uses a resource value before update by earler nstructon Example ADD R,R2,R3 SUB R4,R5,R ; hazard f SUB reads R before ADD wrtes R Hazard Types Structural Hazard Hazard Control Hazard conflct over access to resource nstructon result not ready when needed branch address and condton not ready when needed 43 44

12 45 Dealng wth Hazards Avod error Pause ppelne and wat for resource to be avalable Called WAIT STATE or PIPELINE STALL Degrades processor performance Adds stall clock cycles (wasted tme) to nstructon executon processng clock cycles (deal) + stalled clock cycles completed nstructon deal stall deal stall N + N N N IC large IC IC IC Elmnate cause of stall Improve mplementaton based on analyss of stalls Man actvty of hardware archtects deal stall stall stall deal performance degradaton + + stall deal stall Structural Hazards Conflct over access to resource Typcal structural hazard unfed cache hazard s and data n same memory devce Cannot access data and fetch nstructon on same clock cycle To prevent hazard Stall INSTRUCTION FETCH durng data MEMORY ACCESS CC CC2 CC3 CC4 CC5 Fetch Decode Execute and Access Address Address unfed cache Wrte Back 46 Stall Implementaton for Cache Hazard MEM WB CC I CC2 LW I CC3 LW I CC4 LW I CC5 φ LW I CC6 φ LW CC7 φ CC8 φ CC9 φ CC0 On CC5 Load Word (LW) nstructon blocks Fetch () No nstructon s fetched on CC5 No nstructon (NOP) s forwarded to on CC6 NOP bubble Φ forwarded to on CC7, etc CC CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC0 I MEM WB LW MEM WB MEM WB MEM WB MEM WB Calculatng Effect of Cache Hazard on stall stall cycles nstructons stall cycles stall stalls stall cycles stalls nstructon nstructons types stall nstructon nstructon stall cycle stall load IC cycle stall data memory load stall stall IC + IC data memory store IC load store stall cycle stall IC IC + stall data memory access IC IC stall cycle stall 0.25 loads 0.5 stores + stall data memory access nstructon nstructon stall cycles 0.40 nstructon deal stall (degradaton 29%).4 Assume: Loads ~ 25% Stores ~ 5% Other ~ 60% store 47 48

13 49 Hazards result not ready when needed Classfcaton (named for correct order of operatons) Read After Wrte (RAW) Correct I2 reads regster after I wrtes to t Hazard I2 reads regster before I wrtes to t I2 uses ncorrect value Wrte After Wrte (WAW) Correct I2 wrtes to regster after I wrtes to t Hazard I2 wrtes to regster before I wrtes to t Incorrect value stays n regster Wrte After Read (WAR) Correct I2 wrtes to regster after I reads t Hazard I2 wrtes to regster before reads I t I uses ncorrect value Read After Read (RAR) No hazard reads do not affect regsters To prevent hazard stall ppelne untl result s ready Control Hazards Branch outcome affects program counter (PC) Taken Branch condton s true and PC PC + Dsp Not taken Branch condton s false and PC not changed Target Result of calculaton PC PC + Dsp Branch hazard Outcome not known untl branch executon fnshes Ppelne automatcally fetches (default) nstructon followng branch Default nstructon not correct f branch taken To prevent hazard Flush default nstructons Stall ppelne untl branch condton and branch target are ready Delay n processng branch nstructons s called branch penalty 50 Excepton Hazards Precse Excepton Excepton Hardware or software condton requrng specal servce routne Interrupt Servce response to external hardware event Usually asynchronous Not trggered by program nstructons Does not affect valdty of runnng nstructons Trap Servce response to software condton n runnng program Usually synchronous Trggered by program nstructons May stall or affect valdty of runnng nstructons Hazard Multple nstructons n varous stages of executon n ppelne How/where/when to nterrupt ppelne Where s return-pont? Return-pont Follows atomc operaton Prevous operatons commt all results to state No followng operatons commt any results to state Precse excepton Excepton wth well-defned return-pont Servce excepton followng atomc operaton Restart executon at return pont wthout error I Return-pont I 5 commts no state I 5 I 6 I 7 I 8 commts all state Interrupt Servce Routne 5 52

14 53 Excepton Hazards n 5 Stage Ppelne Exceptons specfc to each stage access excepton n or MEM excepton n Arthmetc excepton n 5 nstructons n varous stages of executon Where s return-pont? How to handle subsequent partally executed nstructons? Berkeley Soluton Attach excepton status feld and source PC to nstructon n rases excepton Mark status feld wth excepton Contnue ppelne untl pror nstructon completes (reaches WB) RETURN-POINT PC of nstructon that rases excepton Flush ppelne (mark nstructons n MEM as NOP to cancel WB) PC CEPTION SERVICE ROUTINE (ESR) Return from ESR depends on excepton type I I 5 CC CC2 CC3 CC4 MEM CC5 WB MEM error CC6 WB MEM CC7 WB MEM CC8 WB MEM CC9 WB CC CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC0 I MEM WB I completes atomcally error φ return pont φ φ φ φ φ I 5 φ φ φ φ ESR MEM WB Ref: 54

Review of Basic Computer Architecture

Review of Basic Computer Architecture of Basc Computer Archtecture 1 Computer Archtecture What s Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Performance Evaluation

Performance Evaluation Performance Evaluaton [Ch. ] What s performance? of a car? of a car wash? of a TV? How should we measure the performance of a computer? The response tme (or wall-clock tme) t takes to complete a task?

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Memory and I/O Organization

Memory and I/O Organization Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Chapter 2 Instruction Set Architecture (ISA)

Chapter 2 Instruction Set Architecture (ISA) Chapter 2 Instruction Set Architecture (ISA) מטרת הפרק הזה היא הגדרת ארכיטקטורה של קבוצת הפקודות והבנת השיקולים שהדריכו תכנון מחשבים בין 1950 1990 ועדיין תקפים לגבי המחשב האישי. נסקור את האפשרויות שניתן

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Computer Architecture. Introduction

Computer Architecture. Introduction to Computer Architecture 1 Computer Architecture What is Computer Architecture From Wikipedia, the free encyclopedia In computer engineering, computer architecture is a set of rules and methods that describe

More information

5.1 The ISR: Overvieui. chapter

5.1 The ISR: Overvieui. chapter chapter 5 The LC-3 n Chapter 4, we dscussed the basc components of a computer ts memory, ts processng unt, ncludng the assocated temporary storage (usually a set of regsters), nput and output devces, and

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute

More information

Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory

Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed

More information

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

ISA and RISCV. CASS 2018 Lavanya Ramapantulu ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Nachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16

Nachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16 Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

ETAtouch RESTful Webservices

ETAtouch RESTful Webservices ETAtouch RESTful Webservces Verson 1.1 November 8, 2012 Contents 1 Introducton 3 2 The resource /user/ap 6 2.1 HTTP GET................................... 6 2.2 HTTP POST..................................

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Oracle Database: SQL and PL/SQL Fundamentals Certification Course Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and

More information

Storage Binding in RTL synthesis

Storage Binding in RTL synthesis Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Giving credit where credit is due

Giving credit where credit is due CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard goddard@cse.unl.edu Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers 1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

RADIX-10 PARALLEL DECIMAL MULTIPLIER

RADIX-10 PARALLEL DECIMAL MULTIPLIER RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

COSC 6385 Computer Architecture. Instruction Set Architectures

COSC 6385 Computer Architecture. Instruction Set Architectures COSC 6385 Computer Architecture Instruction Set Architectures Spring 2012 Instruction Set Architecture (ISA) Definition on Wikipedia: Part of the Computer Architecture related to programming Defines set

More information

Specifications in 2001

Specifications in 2001 Specfcatons n 200 MISTY (updated : May 3, 2002) September 27, 200 Mtsubsh Electrc Corporaton Block Cpher Algorthm MISTY Ths document shows a complete descrpton of encrypton algorthm MISTY, whch are secret-key

More information

A Model RISC Processor. DLX Architecture

A Model RISC Processor. DLX Architecture DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register

More information

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

Instr. execution impl. view

Instr. execution impl. view Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Evaluation of Parallel Processing Systems through Queuing Model

Evaluation of Parallel Processing Systems through Queuing Model ISSN 2278-309 Vkas Shnde, Internatonal Journal of Advanced Volume Trends 4, n Computer No.2, March Scence - and Aprl Engneerng, 205 4(2), March - Aprl 205, 36-43 Internatonal Journal of Advanced Trends

More information

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land

Speeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Pipelining. CS701 High Performance Computing

Pipelining. CS701 High Performance Computing Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Execution/Effective address

Execution/Effective address Pipelined RC 69 Pipelined RC Instruction Fetch IR mem[pc] NPC PC+4 Instruction Decode/Operands fetch A Regs[rs]; B regs[rt]; Imm sign extended immediate field Execution/Effective address Memory Ref ALUOutput

More information

An Efficient Algorithm for PC Purchase Decision System

An Efficient Algorithm for PC Purchase Decision System Proceedngs of the 6th WSAS Internatonal Conference on Instrumentaton, Measurement, Crcuts & s, Hangzhou, Chna, Aprl 15-17, 2007 216 An ffcent Algorthm for PC Purchase Decson Huay Chang Department of Informaton

More information

Chapter 2: Instructions How we talk to the computer

Chapter 2: Instructions How we talk to the computer Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that part of the architecture that is visible to the programmer - instruction formats - opcodes (available instructions)

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

DLK Pro the all-rounder for mobile data downloading. Tailor-made for various requirements.

DLK Pro the all-rounder for mobile data downloading. Tailor-made for various requirements. DLK Pro the all-rounder for moble data downloadng Talor-made for varous requrements www.dtco.vdo.com Smply brllant, brllantly smple Always the rght soluton The DLK Pro s the VDO product famly, whch sets

More information

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program. Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information