Review of Basic Computer Architecture
|
|
- Cornelius Lester
- 5 years ago
- Views:
Transcription
1 of Basc Computer Archtecture 1
2 Computer Archtecture What s Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp between dfferent hardware components of a computer system. It may also refer to the practcal art of defnng the structure and relatonshp of the subcomponents of a computer. Ths artcle needs attenton from an expert n computer scence. 2
3 Computer Archtecture What s Computer Archtecture Defnton 2.0 In computer scence and engneerng, computer archtecture refers to the study of performance n computer systems. It also refers to the practcal scence of applyng performance theory to specfyng the structure and relatonshp of the subcomponents of a computer. From an expert n computer scence. 3
4 Theory Goals Specfcaton Theory Computaton components CPU: ALU + memory + control Instructons Performance = run-tme speed Run-tme of what? Compared to what? von Neumann Archtecture nput memory output Arthmetc Logc Unt (ALU) Requrements Word processng Number crunchng Gamng Web server Real tme controller data/nstructon path control path Specfcaton Requrements + performance theory component mplementaton 4
5 Requrements Relatve to Applcaton Fastest CPU Intel Xeon Gold Hgh-end verson of Intel x86-64 processor famly IA-32 nstructon set on enhanced P6 mcro-archtecture Netburst Ivy Brdge Haswell Broadwell Skylake Pentum II Pentum III Pentum 4 Multcore Fastest supercomputer IBM Summt 4,608 compute nodes 2 Power9 22-core CPUs 6 NVIDIA Tesla V100 GPUs (graphcs processng unt) 202,752 CPU cores + 27,648 GPUs 10 PB memory = 10 Mega GB (10,000 TB) Energy effcent GFlops/watt Smartphones ARM CPU Low power Hgher performance / Watt than x86 5
6 Fundamental Archtectural Abstractons Dgtal computer Machne that can be programmed to process symbols Data Symbol wth no ntrnsc meanng to machne User mposes meanng Integer, float, strng,... Operaton Symbol descrbng processng of data symbols Machne nterprets meanng transfer, ALU, control, OS,... Instructon Symbol descrbng operaton on data Machne language = collecton of legal nstructons Addressng Mode Specfes data locaton as operand Source operand data nput to operaton Destnaton operand data output from operaton 6
7 Stages n Computer Desgn Instructon Set Archtecture (ISA) 1. Defne unverse of problems to be solved 2. Study canddate operatons at level of system programmer Atomc operatons complete sequentally General operaton = combnaton of atomc operatons 3. Specfy nstructon set for machne language Choose mnmum set of orthogonal operatons Not too many ways to solve same problem Implementaton 1. Desgn machne as mplementaton of ISA 2. Evaluate theoretcal performance 3. Identfy performance problem areas 4. Improve processor effcency 7
8 Typcal Operatons Data transfer Load (r m), store (m r), move (r/m r/m), convert data types Arthmetc/Logcal (ALU) Integer arthmetc (+ compare shft) and logcal (AND, OR, NOR, XOR) Decmal Integer arthmetc on decmal numbers Floatng pont (FPU) Floatng pont arthmetc (+ sqrt trg exp ) Strng Strng move, strng compare, strng search Control Condtonal and uncondtonal branch, call/return, trap Operatng System System calls, vrtual memory management nstructons Graphcs Pxel operatons, compresson/decompresson operatons 8
9 Memory Herarchy Memory locatons outsde CPU and RAM Memory locaton outsde CPU Memory locaton n or near CPU Memory locaton nsde CPU Stores data and nstructons of "all" programs Stores "all" data and nstructons of runnng programs Fast access to mportant data and nstructons from RAM Fast access to small amount of nformaton Organzed by OS Organzed by addresses Copy of RAM secton Organzed by CPU Long Term Storage Man Memory (RAM) Cache Regster All Fles and Data Runnng Programs and Data Next Few Instructons and Data Current Data 9
10 CPU and Memory Herarchy CPU controller accesses L1 cache f (L1 cache ht) {access performed n 1 clock cycle} else { CPU L1 cache mss L1 cache accesses cache controller cache controller ntates access to L2 and man memory f (address n L2 cache) {controller copes contents to L1 from L2} else {controller copes locaton to L1 from man memory} } ALU Regsters L1 nstructons L1 data cache controller L2 I/O Dsk Man Memory access n 1 CC request update access latency >> 1 clock cycle Cache mss penalty Address not n L1 delay n memory access 10
11 Specfyng Operands Immedate Constant = IMM = numercal value coded nto nstructon Regster operands regster name = a CPU storage locaton REGS[regster name] = data stored n regster REGS[R3] = data stored n regster R3 = R Memory operands address = a memory storage locaton MEM[address] = data stored n memory MEM[ ] = data stored at address = Effectve Address (EA) ponter arthmetc REGS[R3] &(varable) MEM[REGS[R3]+4] = *(&(varable)+4) = *(REGS[R3]+4) = *( ) =
12 Addressng Modes Mode Syntax Memory Access Use Regster R3 Regs[R3] Regster data Immedate #3 3 Constant Drect (absolute) Regster deferred (1001) Mem[1001] Statc data (R1) Mem[Regs[R1]] Ponter Dsplacement 100(R1) Mem[100+Regs[R1]] Local varable Indexed (R1 + R2) Mem[Regs[R1]+Regs[R2]] Array addressng Memory ndrect Auto Increment Auto Mem[Mem[Regs[R3]]] Ponter to ponter (R2)+ -(R2) Mem[Regs[R2]] Regs[R2] Regs[R2]+d Regs[R2] Regs[R2]-d Mem[Regs[R2]] Stack access Stack access Scaled 100(R2)[R3] Mem[100+Regs[R2]+Regs[R3]*d] Indexng arrays PC-relatve (PC) Mem[PC+value] PC-relatve deferred 1001(PC) Mem[PC+Mem[1001]] Load nstructon to data regster Load nstructon to data regster 12
13 Commtment to State Internal regsters Temporary regsters used n executng machne nstructons Not vsble to programs Archtectural state CPU regsters vsble to programs System state All data resources vsble to programs Archtectural state + system memory Commtment to state Update of system state Wrte to archtectural state / system memory 13
14 Complex Instructon Set Computer (CISC) Classc Machne Desgn 300 nstructon types 15 addressng modes 10 data types Complex machne mplementatons Manframes ( ) Large, expensve, centralzed computers for bg busness and government Manufacturers: IBM, Control Data, Burrows, Honeywell Mncomputers ( ) Smaller computers for smaller organzatons Manufacturers: Dgtal (PDP/VAX), Data General (Eclpse) CISC mcroprocessors ( ) 6800 (1974) and 8086 (1978) desgned as tny CISC on chp Apple II (1977) 6502 (1975) IBM PC (1981) 8088 (1979) Intel x86 for PC/Mac = last CISC ISA stll manufactured. 14
15 Why CISC? Semantc Gap Argument Computer language should mtate natural language Large vocabulary + hgh redundancy flexblty + power Terrble complers Lmted optmzaton Lmted error messagng Effcent code wrtten or optmzed n assembly language Expensve memory RAM ~ $5000/MB wholesale n 1977 RAM ~ $0.01/MB n 2012 Implcatons for machne language Desgn for user-frendly programmng and small memory use Many hghly specfc nstructons usng many addressng modes Compact nstructon codes that perform a lot of work 15
16 Physcal Implementaton of CISC Generc Machne ALU Subsystem 1 Regsters IN 2 3 OUT ALU Operaton ALU Result Flag System Bus Status Word Decoder IR PC + control MAR MDR PC - program counter MAR - memory address regster Address Data IR - nstructon regster MDR - memory data regster Man Memory 16
17 Decodng Machne Instructons Machne Language Instructon SUB R1, R2, 100(R3) Mcrocode Instructon Sequence (Mcroprogram) ALU_IN R3 Mcrocode nstructon ALU 100 Hardware level atomc operaton ADD 9 lnes = 9 clock cycles MAR OUT READ 1 ALU_IN MDR Regsters IN 2 ALU R2 ALU Operaton SUB ALU Result Flag System Bus R1 OUT ALU Subsystem 3 OUT Status Word Decoder IR PC + control MAR MDR PC - program counter IR - nstructon regster MAR - memory address regster MDR - memory data regster Address Data Man Memory 17
18 Run Tme and Clock Cycles CPU s tmed by perodc sgnal called clock (CLK) clock cycle Clock Cycle (CC) tme = seconds per cycle Instructon requres 1 or more clock cycles to process Clock Rate = cycles per second = Hz (Hertz) Run tme = clock cycles to run program seconds per clock cycles clock cycles to run program = clock cycles per second Hgher clock rate shorter run tme More clock cycles (at constant clock rate) longer run tme 18
19 Intel 386 Mcroprocessor 19
20 Basc Performance Measures Run Tme Elapsed tme T from start to fnsh of a defned program task Latency Excess response tme depends on context Throughput Number of defned tasks performed per unt tme 1 Throughput = T + latency between tasks Enhancement Change to system new run tme T ' Speedup T S = S > 1 T' < T T ' 20
21 Defntons T = t = IC = CPI = N = τ = R total run tme of program total run tme of nstructons n group number of nstructons n group ( Instructon Count) number of clock cycles to run 1 nstructon n group number of clock cycles to run all nstructons n group seconds per clock cycle ( C ycles Per = clock rate = clock frequency = clock cycles per second = Hertz (Hz) = IC = N = CPI = quantty ' = total number of nstructons n program total number of clock cycles to run program average number of clock cycles per nstructon for the program new value of quantty after archtectural change Instructon) 1 τ 21
22 CPU Equaton Clock cycles to run all nstructons of type clock cycles N = nstructons of type = IC CPI nstructon of type Total clock cycles to run all nstructons n program = = N N IC CPI all groups Average number of clock cycles per nstructon for program total number of clock cycles to run program N CPI = = total number of nstructons n program IC 1 1 IC CPI = N = IC CPI = CPI IC IC IC weghted average IC = 1 IC Rato IC IC s proporton (percent) of nstructons n group 22
23 CPU Run Tme Run tme of one nstructon of type clock cycles nstructon of type seconds clock cycle = CPI τ Run tme for all nstructons of type t = nstructons of type = IC CPI τ Total run tme for program So clock cycles nstructon of type seconds clock cycle IC = = = τ T t CPI IC τ CPI IC all groups IC T = CPI IC τ = clock cycles per nstructon number of nstructons clock cycle 23
24 Amdahl Equaton F S S t = = T t = = t ' relatve run tme of nstructons n group speedup for nstructons n group t t FT F T 1 = = = = = = T' t ' t F F F T S S S S Enhancement to group e S 1 1 = = 1 F F F + 1 Fe + S S S e e e e e e Amdahl's "Law" Speedup lmted by 1 F e Enhance maxmum F e Accept mparment to small 1 F e 24
25 Amdahl Equaton n Parallel Processng CPI F P n processors n= 1 CPI = P n n= 1 + CPI P work can be parallelzed ( ) work cannot be parallelzed ( ) n= 1 CPI n= 1 = F ( 1 ) P + CPI FP n = Fracton of processng that can be performed ndependently n = Number of processng unts S n= 1 n= 1 CPI IC τ CPI = = = τ n processors n processors CPI IC CPI 1 1 F + ( ) P F n P 25
26 SPEC Benchmark Programs for system performance measurement + comparson Standard + repeatable Test system for realstc condtons Summary score for easy comparson Results posted at Specfc test sutes Cnt CPU nteger nstructons Cfp CPU FP nstructons Performance as fle server, web server, mal server, graphcs Updated every few years to reflect realstc condtons Based on current statstcal dstrbutons of computng tasks Current CPU test verson 2017 Prevous verson 2006 Reports speedup Run tme compared wth a standard machne 26
27 How SPEC Works User runs n programs on test machne Records run-tme condtons test T, = 1,2,..., n Records program run-tme n seconds SPEC provdes run-tmes on reference machne Sun Fre V490 ref T 2100 MHz UltraSPARC-IV+ processor Powerful symmetrc multprocessng (SMP) server ( ) User calculates speedup for each program S =, = 1, 2,..., n User calculates geometrc mean of speedups T T S ( test machne on ref) = = 1 ref test S ( machne A compared to machne B) n T T ref test 1 n = S ( machne A on ref) S ( machne B on ref) 27
28 Typcal SPEC Report 1 SPEC(R) CPU2017 Integer Speed Result ASUSTeK Computer Inc. ASUS RS700-E9(Z11PP-D24) Server System (2.70 GHz, Intel Xeon Gold 6150) CPU2017 Lcense: 9016 Test date: Dec-2017 Test sponsor: ASUSTeK Computer Inc. Hardware avalablty: Jul-2017 Tested by: ASUSTeK Computer Inc. Software avalablty: Sep-2017 Base Base Base Peak Peak Peak Benchmarks Thrds Run Tme Rato Thrds Run Tme Rato perlbench_s gcc_s mcf_s omnetpp_s xalancbmk_s x264_s deepsjeng_s leela_s exchange2_s xz_s SPECspeed2017_nt_base 8.87 SPECspeed2017_nt_peak 9.16 Base = standard confguraton Peak = specalst confguraton 28
29 Typcal SPEC Report 2 HARDWARE CPU Name: Intel Xeon Gold 6150 Max MHz.: 3700 Nomnal: 2700 Enabled: 36 cores, 2 chps Orderable: 1, 2 chp(s) Cache L1: 32 KB I + 32 KB D on chp per core L2: 1 MB I+D on chp per core L3: MB I+D on chp per chp Other: None Memory: 768 GB (24 x 32 GB 2Rx4 PC4-2666V-R) Storage: 1 x 240 GB SATA SSD Other: None SOFTWARE OS: Red Hat Enterprse Lnux Server release 7.3 (x86_64) Kernel el7.x86_64 Compler: C/C++: Verson of Intel C/C++ Compler; Fortran: Verson of Intel Fortran Compler Parallel: Yes Frmware: Verson 0601 released Oct-2017 Fle System: xfs System State: Run level 3 (mult-user) Base Ponters: 64-bt Peak Ponters: 32/64-bt Other: jemalloc: jemalloc memory allocator lbrary V
30 Some Cnt2017 Results Processor Clock (GHz) Total Chps Total Cores Total Threads Cnt 2017 Base Cnt 2006 Base Rato Intel Xeon Gold Intel Xeon Gold Intel Xeon Platnum Intel Xeon Bronze Intel Xeon Platnum 8180 Intel Core 2 Duo E6850 wth auto parallel Intel Core 2 Duo E6850 wth no auto parallel
31 Some Comments on Cnt2017 Results Auto parallel Hgh level Cnt code not threaded for parallel processng Auto parallel compler creates parallel threads usng heurstcs Provdes lmted speed up (or even degradaton) All CPU results n table use auto parallel except last Intel Xeon Gold 6146 wth 3.2 GHz clock Fastest CPU n Cnt2017 tests 2 chps (24 threads) slghtly faster than 4 chps (48 threads) Communcaton between more threads can slow processng 4 chps faster on Cnt2006 (usng dfferent benchmark programs) Intel Xeon Platnum 8152 wth 2.0 GHz clock Cnt wth 64 threads = 7.00 Wth 3.2 GHz clock, expect Cnt = 7 x 3.2 GHz / 2.0 GHz = 11.2 Not much better than Gold 6146 wth 24 threads Core Duo E6850 old processor not tested on Cnt2017 Cnt2006 wth 1 threads (no auto parallel) = 18.7 Cnt2006 wth 2 threads (auto parallel) = 19.9 = 6% speed up 31
32 Benchmarkng a Processor Desgn Specfy Instructon Set Archtecture (ISA) Specfes machne language for proposed CPU Provdes human-readable assembly language Determnes CPI for each nstructon group Count clock cycles requred to mplement each nstructon n ISA Wrte complers for proposed machne language C, C++, Fortran Comple benchmark programs to machne language Programs from SPEC CINT and CFP Analyze compler output (executable programs) Sort machne nstructons nto groups Calculate relatve nstructon count IC /IC for each group Calculate average CPI and overall run tme T Compare run tme wth reference machne 32
33 CISC Creates Ant CISC Revoluton Data General ntroduces Eclpse 32-bt CISC mncomputer Dgtal (DEC) ntroduces VAX 32-bt CISC mncomputer Frst serous nexpensve competton to manframe computers Serous computers became avalable to small organzatons UNIX developed as mncomputer operatng system TCP/IP developed to support networks of mncomputers Computer Scence emerged as separate academc dscplne Students needed topcs for projects, theses, dssertatons Research results on mncomputer performance CISC uses machne resources neffcently Most machne nstructons are rarely used n programs CISC machnes run slowly to support unnecessary features 33
34 RISC "Phlosophy" Technologcal developments from 1975 to 1990 Prce of RAM drops from $5000 / MByte (1975) to $5 / MByte (1990) Complers become powerful and effcent wth extensve optmzaton Portable code made practcal by mncomputer, Unx, C, and TCP/IP Prncpal research results on CISC performance ~ 90% of run tme devoted to ~ 10% of nstructon set ~ 90% of nstructons n ISA rarely used Reduced Instructon Set Computer (RISC) Apply Amdahl's "Law" CISC ISA Speed up operatons accountng for most of run tme Ignore mparments to other nstructons RISC ISA only most mportant CISC nstructons Other CISC nstructons = multple RISC nstructons RISC mplementaton executes ts ISA n fast dedcated hardware 34
35 Instructon Types Representatve nstructon dstrbuton Fve programs from SPECnt92 benchmark sute Comple for x86 nstructon set (ISA for Intel 386/486/Pentum) Instructon Relatve Proporton of Total Run Tme Load 22% Condtonal branch 20% Compare 16% Store 12% Add 8% And 6% Sub 5% Move reg-reg 4% Call 1% Return 1% Other 5% Total 100% Frst 10 nstructons account for 95% of run tme Amdahl's "Law" Fast mplementaton of 95% Other 5% wll not serously degrade performance Must nclude uncondtonal branch for completeness Ref: Hennessy / Patterson, fgure
36 RISC Mcroprocessors Smpler ISA Small set of unform length machne nstructons Smpler hardware No mcrocode standard nstructon mplementaton No central system bus CPU process several nstructons at once Lower CPI + hgher clock speed Instructon completes on (almost) every clock cycle All processors today use RISC technology Pure RISC (PowerPC, Sparc, MIPS, ARM, ) RISC technology for CISC language (Pentum II 4, Centrno, Core) Explctly parallel RISC (Intel Itanum, IBM manframes) 36
37 Typcal RISC ISA Data types 32-bt / 64-bt nteger and floatng pont Flat memory model wth 32-bt / 64-bt address Address mode: dsp(rn) ~ Mem[Regs[Rn] + dsp] Regster-regster operaton model nteger regsters FP regsters OS (kernel mode) regsters Result flags Read-only (value = 0) and wrte-only (null) regsters Instructon types Load, store, move regster-regster Integer add, sub, mult, dv, shft, compare Boolean and, or, xor Floatng pont add, sub, mult, dv, sqrt, compare Jump, jump regster, jump and lnk, condtonal branch 37
38 Typcal Instructon Encodng Instructon types for Alpha 64-bt RISC processor Opcode Number PALcode type Opcode Ra Dsp Branch type Opcode Ra Rb Dsp Memory type Opcode Ra Rb Functon Rc Operate type Opcode (6 bts) dentfes operaton to CPU Ra, Rb (5 bts) dentfy regster names (R0 to R32) PALcode (Prvleged Archtecture Lbrary) hardware support for OS Branch test Ra, true Ra PC, PC PC + Dsp Memory move between Ra and Mem[Regs[Rb] + Dsp] Operate R/R Rc Ra functon Rb (regster name) Operate R/I Rc Ra functon Imm (n Rb and 3 bts of functon) 38
39 Smple RISC Physcal Implementaton Stage 1 Stage 2 Stage 3 Stage 4 Instructon Fetch Instructon Decode Execute Data Memory Access Wrte Back Address Instructon Address Data Instructon Memory Early PowerPC mplementaton Data Memory Wrte No system bus nstructons proceed from left to rght (assembly lne) Separate cache memory for nstructons and data Smple repettve operatons 1. Fetch unform-length nstructons 2. Instructon decode read source operands from regsters 3. Execute ALU nstructons and calculate addresses 4. Access memory and/or wrte destnaton operands (commt to state) One CC per stage per nstructon 4 clock cycles per nstructon 39
40 Ppelnng The RISC Advantage Instructon Level Parallelsm (ILP) Hardware starts second nstructon before frst completes Typcally 4 nstructons n varous stages of executon at one tme Stage 1 Stage 2 Stage 3 Stage 4 Instructon Fetch Instructon Decode Execute Data Memory Access Wrte Back Address Instructon Address Data Instructon Memory Data Memory Wrte CC Stage 1 Stage 2 Stage 3 Stage 4 1 I 1 2 I 2 I 1 3 I 3 I 2 I 1 4 I 4 I 3 I 2 I 1 5 I 5 I 4 I 3 I 2 6 I 6 I 5 I 4 I 3 40
41 Instructon Orented Vew Clock Cycles Instructons I 1 IF ID EX W I 2 I 3 I 4 IF ID EX W IF ID EX W IF ID EX IF ID Instructon Fetch Instructon Decode I 5 IF ID EX Execute I 6 IF W Wrte N CPI deal = IC+ (ppelne length 1) deal N = IC IC + ( ppelne length 1) ppelne length 1 = = 1+ 1 IC large IC IC nstructons T = CPI IC τ IC τ = IC large clock rate deal 41
42 Ppelne Imbalance Stage 1 Stage 2 Stage 3 Stage 4 Instructon Fetch Instructon Decode Execute Data Memory Access Wrte Back Address Instructon Address Data Instructon Memory Data Memory Wrte Instructon executes n 4 clock cycles Clock cycle tme determned by LOAD nstructon Longest executon tme τ τ τ τ τ =τ fetch decode execute memory access regster wrte-back mnmum τ >τ +τ 2 τ clock cycle memory access regster wrte-back mnmum Most nstructons do not access data memory n stage 4 Only LOAD and STORE access data memory Only LOAD performs both memory access and regster wrte-back Most operatons can complete n tme τ mnmum 42
43 Superppelnng Stage 1 IF Stage 2 ID Stage 3 EX Stage 4 MEM Stage 5 WB Instructon Fetch Instructon Decode Execute Data Memory Access Wrte Back Address Instructon Address Data Instructon Memory Data Memory Dvde stage 4 nto two stages Only load/store do useful work n MEM Stage Dvde clock cycle tme (double clock rate) τ ' = τ τ/2 mnmum CPI IC τ 1 IC τ S = = = 2 CPI ' IC ' τ ' 1 IC τ/ I 1 F D E M W I 2 I 3 I 4 I 5 CPI F D E M W F D E M W F D E M F D E deal deal ' = CPI = 1 Programs can run twce as fast 43
44 Ppelne Hazards Instructon dependences Result of one nstructon s source for later nstructon Hazard condton Processor runs unnterrupted but provdes ncorrect answers Ppelne hazard Several nstructons n varous stages of executon Ppelne uses a resource value before update by earler nstructon Example ADD R1,R2,R3 SUB R4,R5,R1 Hazard Types ; hazard f SUB reads R1 before ADD wrtes R1 Structural Hazard Data Hazard Control Hazard conflct over access to resource nstructon result not ready when needed branch address and condton not ready when needed 44
45 Dealng wth Hazards Avod error Pause ppelne and wat for resource to be avalable Called WAIT STATE or PIPELINE STALL Degrades processor performance Adds stall clock cycles (wasted tme) to nstructon executon CPI = processng clock cycles (deal) + stalled clock cycles completed nstructon deal stall deal stall N + N N N = = + = CPI + CPI 1+ CPI IC large IC IC IC deal stall stall performance degradaton stall deal CPI CPI = = 1 1+ CPI CPI + CPI stall deal stall Elmnate cause of stall Improve mplementaton based on analyss of stalls Man actvty of hardware archtects 45
46 Structural Hazards Conflct over access to resource Typcal structural hazard unfed cache hazard Instructons and data n same memory devce Cannot access data and fetch nstructon on same clock cycle To prevent hazard Stall INSTRUCTION FETCH durng data MEMORY ACCESS CC1 CC2 CC3 CC4 CC5 Instructon Fetch Instructon Decode Execute Data Access Wrte Back Address Instructon Address Data Instructon and Data Memory unfed cache 46
47 Stall Implementaton for Cache Hazard IF ID EX MEM WB CC1 I 1 CC2 LW I 1 CC3 I 2 LW I 1 CC4 I 3 I 2 LW I 1 CC5 φ I 3 I 2 LW I 1 CC6 I 4 φ I 3 I 2 LW CC7 I 4 φ I 3 I 2 CC8 I 4 φ I 3 CC9 I 4 φ CC10 I 4 On CC5 Load Word (LW) nstructon blocks Instructon Fetch (IF) No nstructon s fetched on CC5 No nstructon (NOP) s forwarded to ID on CC6 NOP = bubble = Φ forwarded to EX on CC7, etc CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 I 1 IF ID EX MEM WB LW IF ID EX MEM WB I 2 IF ID EX MEM WB I 3 IF ID EX MEM WB I 4 IF ID EX MEM WB 47
48 Calculatng Effect of Cache Hazard on CPI CPI stall stall cycles stall cycles = = nstructons stall stalls nstructon stall cycles stalls = stall nstructon = types nstructons nstructon load 1 stall cycle 1 stall IC 1 cycle 1 stall IC = + stall data memory load IC stall data memory store IC load store 1 stall cycle 1 stall IC IC = + stall data memory access IC IC 1 stall cycle 1 stall 0.25 loads 0.15 stores = + stall data memory access nstructon nstructon = 0.40 stall cycles nstructon Assume: Loads ~ 25% Stores ~ 15% Other ~ 60% store deal stall CPI CPI CPI = + = (degradaton = = 29%)
49 Data Hazards Instructon result not ready when needed Classfcaton (named for correct order of operatons) Read After Wrte (RAW) Correct I2 reads regster after I1 wrtes to t Hazard I2 reads regster before I1 wrtes to t I2 uses ncorrect value Wrte After Wrte (WAW) Correct I2 wrtes to regster after I1 wrtes to t Hazard I2 wrtes to regster before I1 wrtes to t Incorrect value stays n regster Wrte After Read (WAR) Correct I2 wrtes to regster after I1 reads t Hazard I2 wrtes to regster before reads I1 t I1 uses ncorrect value Read After Read (RAR) No hazard reads do not affect regsters To prevent hazard stall ppelne untl result s ready 49
50 Control Hazards Branch outcome affects program counter (PC) Taken Branch condton s true and PC PC + Dsp Not taken Branch condton s false and PC not changed Target Result of calculaton PC PC + Dsp Branch hazard Outcome not known untl branch executon fnshes Ppelne automatcally fetches (default) nstructon followng branch Default nstructon not correct f branch taken To prevent hazard Flush default nstructons Stall ppelne untl branch condton and branch target are ready Delay n processng branch nstructons s called branch penalty 50
51 Excepton Hazards Excepton Hardware or software condton requrng specal servce routne Interrupt Servce response to external hardware event Usually asynchronous Not trggered by program nstructons Does not affect valdty of runnng nstructons Trap Servce response to software condton n runnng program Usually synchronous Trggered by program nstructons May stall or affect valdty of runnng nstructons Hazard Multple nstructons n varous stages of executon n ppelne How/where/when to nterrupt ppelne Where s return-pont? 51
52 Precse Excepton Return-pont Follows atomc operaton Prevous operatons commt all results to state No followng operatons commt any results to state I 1 I 2 I 3 I 4 commts all state I 4 Precse excepton Excepton wth well-defned return-pont Servce excepton followng atomc operaton Restart executon at return pont wthout error Return-pont I 5 commts no state I 5 I 6 I 7 Interrupt Servce Routne I 8 52
53 Excepton Hazards n 5 Stage Ppelne Exceptons specfc to each stage Memory access excepton n IF or MEM Instructon excepton n ID Arthmetc excepton n EX 5 nstructons n varous stages of executon Where s return-pont? How to handle subsequent partally executed nstructons? CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 I 1 IF ID EX MEM WB I 2 IF ID EX MEM error WB I 3 IF ID EX MEM WB I 4 IF ID EX MEM WB I 5 IF ID EX MEM WB 53
54 Berkeley Soluton Attach excepton status feld and source PC to nstructon n IF Instructon rases excepton Mark status feld wth excepton Contnue ppelne untl pror nstructon completes (reaches WB) RETURN-POINT PC of nstructon that rases excepton Flush ppelne (mark nstructons n IF MEM as NOP to cancel WB) PC EXCEPTION SERVICE ROUTINE (ESR) Return from ESR depends on excepton type I 1 CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 IF ID EX MEM WB Ref: 54 I 1 completes atomcally I 2 IF ID EX error φ return pont = I 2 I 3 IF ID EX φ φ I 4 IF ID φ φ φ I 5 IF φ φ φ φ ESR IF ID EX MEM WB
Review of Basic. Computer Architecture. Theory Goals Specification
Computer Archtecture What s Computer Archtecture of Basc Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp
More informationFrom CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations
1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationPerformance Evaluation
Performance Evaluaton [Ch. ] What s performance? of a car? of a car wash? of a TV? How should we measure the performance of a computer? The response tme (or wall-clock tme) t takes to complete a task?
More informationMotivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:
4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/
More informationMemory and I/O Organization
Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest
More informationConditional Speculative Decimal Addition*
Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationAssembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.
IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language
More informationChapter 2 Instruction Set Architecture (ISA)
Chapter 2 Instruction Set Architecture (ISA) מטרת הפרק הזה היא הגדרת ארכיטקטורה של קבוצת הפקודות והבנת השיקולים שהדריכו תכנון מחשבים בין 1950 1990 ועדיין תקפים לגבי המחשב האישי. נסקור את האפשרויות שניתן
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationHarvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)
Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst
More information5.1 The ISR: Overvieui. chapter
chapter 5 The LC-3 n Chapter 4, we dscussed the basc components of a computer ts memory, ts processng unt, ncludng the assocated temporary storage (usually a set of regsters), nput and output devces, and
More informationISA and RISCV. CASS 2018 Lavanya Ramapantulu
ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationHigh level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization
What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton
More informationOutline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011
9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationCache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed
More informationLecture 4: Instruction Set Architecture
Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationNachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16
Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos
More informationAADL : about scheduling analysis
AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationAppendix C: Pipelining: Basic and Intermediate Concepts
Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationAdvanced Computer Architecture
Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes
More informationData Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach
Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer
More informationETAtouch RESTful Webservices
ETAtouch RESTful Webservces Verson 1.1 November 8, 2012 Contents 1 Introducton 3 2 The resource /user/ap 6 2.1 HTTP GET................................... 6 2.2 HTTP POST..................................
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More information4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.
//7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the
More informationUniprocessors. HPC Fall 2012 Prof. Robert van Engelen
Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures
More informationInstr. execution impl. view
Pipelining Sangyeun Cho Computer Science Department Instr. execution impl. view Single (long) cycle implementation Multi-cycle implementation Pipelined implementation Processing an instruction Fetch instruction
More informationSpeeding Up DLX Computer Architecture Hadassah College Spring 2018 Speeding Up DLX Dr. Martin Land
Speeding Up DLX 1 DLX Execution Stages Version 1 Clock Cycle 1 I 1 enters Instruction Fetch (IF) Clock Cycle2 I 1 moves to Instruction Decode (ID) Instruction Fetch (IF) holds state fixed Clock Cycle3
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationEITF20: Computer Architecture Part2.1.1: Instruction Set Architecture
EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer
More informationCache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access
Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?
More informationPredict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch
branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,
More informationInstruction Set Principles and Examples. Appendix B
Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationA Model RISC Processor. DLX Architecture
DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register
More informationInstruction Pipelining
Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages
More informationCISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1
CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationOptimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden
Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationExecution/Effective address
Pipelined RC 69 Pipelined RC Instruction Fetch IR mem[pc] NPC PC+4 Instruction Decode/Operands fetch A Regs[rs]; B regs[rt]; Imm sign extended immediate field Execution/Effective address Memory Ref ALUOutput
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationPipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome
Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy
More informationNews. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example
Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationSpecifications in 2001
Specfcatons n 200 MISTY (updated : May 3, 2002) September 27, 200 Mtsubsh Electrc Corporaton Block Cpher Algorthm MISTY Ths document shows a complete descrpton of encrypton algorthm MISTY, whch are secret-key
More informationGiving credit where credit is due
CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard goddard@cse.unl.edu Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege
More informationThere are different characteristics for exceptions. They are as follows:
e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture
More informationCPU Architecture and Instruction Sets Chapter 1
CPU Architecture and Instruction Sets Chapter 1 1 Is CPU Architecture Relevant for DBMS? CPU design focuses on speed resulting in a 55%/year improvement since 1987: If CPU performance in database code
More informationCHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar
CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want
More informationVerification by testing
Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over
More informationCOSC4201 Pipelining. Prof. Mokhtar Aboelaze York University
COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC
More informationOracle Database: SQL and PL/SQL Fundamentals Certification Course
Oracle Database: SQL and PL/SQL Fundamentals Certfcaton Course 1 Duraton: 5 Days (30 hours) What you wll learn: Ths Oracle Database: SQL and PL/SQL Fundamentals tranng delvers the fundamentals of SQL and
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More information3D vector computer graphics
3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationPipelining. CS701 High Performance Computing
Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationIP Camera Configuration Software Instruction Manual
IP Camera 9483 - Confguraton Software Instructon Manual VBD 612-4 (10.14) Dear Customer, Wth your purchase of ths IP Camera, you have chosen a qualty product manufactured by RADEMACHER. Thank you for the
More informationIntroduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers
1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects
More informationStorage Binding in RTL synthesis
Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationThese actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.
MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously
More informationWhat is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.
Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470
More informationLecture 7: Pipelining Contd. More pipelining complications: Interrupts and Exceptions
Lecture 7: Pipelining Contd. Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 More pipelining complications: Interrupts and Exceptions Hard to handle in pipelined
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture
More informationMultiple Issue ILP Processors. Summary of discussions
Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware
More informationChapter 2: Instructions How we talk to the computer
Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that part of the architecture that is visible to the programmer - instruction formats - opcodes (available instructions)
More informationRISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.
COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped
More informationCOSC 6385 Computer Architecture. Instruction Set Architectures
COSC 6385 Computer Architecture Instruction Set Architectures Spring 2012 Instruction Set Architecture (ISA) Definition on Wikipedia: Part of the Computer Architecture related to programming Defines set
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationMulti-cycle Instructions in the Pipeline (Floating Point)
Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining
More informationINSTRUCTION LEVEL PARALLELISM
INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationWilliam Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function
William Stallings Computer Organization and Architecture Chapter 11 CPU Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data Registers
More informationIf you miss a key. Chapter 6: Demand Paging Source:
ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Source: http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor If you mss a key after yesterday
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More information