EECS 470 Lecture 2. Performance, Power & ISA. Fall Jon Beaumont

Size: px
Start display at page:

Download "EECS 470 Lecture 2. Performance, Power & ISA. Fall Jon Beaumont"

Transcription

1 Performance, Power & ISA Fall 218 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, udge, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, niversity of Pennsylvania, and niversity of Wisconsin. Slide 1

2 Warm p Riddle A car must drive 2 miles. It drives with an average speed of V 1 the first mile. How fast must it travel during the second mile so that its total average speed is twice that of the first mile (i.e. V Total =2*V 1 )? (Vote here: etc.ch/zwn) a) b) ½ V 1 c) 2 V 1 d) 4 V 1 e) Other Slide 2

3 Class logistics Last Time Discussed high level goals of computer architecture Performance Power Cost, security, ease of programmability, etc. Discussed how to increase program performance ostly through adding parallelism Limits of parallelism Amdahl s Law Slide 3

4 Today Dive into performance metrics a bit more Quantifying performance (throughput and latency) Discuss arithmetic of averages ISA overview Von Neumann architecture CISC vs RISC Power and Energy Start on 5-stage processor and pipeline review Slide 4

5 Administrative Lab 1 due Thursday at 4:29pm Check off with GSI in OH Project 1 due Saturday at 11:59pm 9 submissions so far Don t leave to the last minute HW 1 due next Tuesday (9/18) at 11:59pm Submit to Gradescope (see website) Should cover all material by Wednesday Everyone have access to Canvas/Piazza/Gradescope? Do I have everyone s picture? Slide 5

6 Performance and Power Trends Source: Chris Batten Dissertation, IT (21) Slide 6

7 Performance Two definitions Latency (execution time): time to finish a fixed task Throughput (bandwidth): number of tasks in fixed time Very different: throughput can exploit parallelism, latency can t Baking bread analogy Often contradictory Choose definition to match measurement goals Example: move people from A to B, 1 miles Car: capacity = 5, speed = 6 miles/hour Bus: capacity = 6, speed = 2 miles/hour Latency: car = 1 min, bus = 3 min Throughput: car = 3 PPH, bus = 12 PPH Slide 7

8 Performance Improvement Processor A is times faster than processor B if Latency(P,A) = Latency(P,B) / Throughput(P,A) = Throughput(P,B) * Processor A is % faster than processor B if Latency(P,A) = Latency(P,B) / (1+/1) Throughput(P,A) = Throughput(P,B) * (1+/1) Car/bus example Latency? Car is 3 times (and 2%) faster than bus Throughput? Bus is 4 times (and 3%) faster than car Slide 8

9 Latency vs Throughput What are three computing applications where we care mostly about throughput? What about latency? Slide 9

10 Averaging Performance Numbers I You can add latencies, but not throughput Latency(P1+P2, A) = Latency(P1,A) + Latency(P2,A) Throughput(P1+P2,A)!= Throughput(P1,A) + Throughput(P2,A) E.g., 1 3 miles/hour miles/hour Average is not 6 miles/hour.33 hours at 3 miles/hour +.1 hours at 9 miles/hour Average is only 47 miles/hour! (2 miles / ( hours)) Slide 1

11 Averaging Performance Numbers II Latency(P1+P2, A) = Latency(P1,A) + Latency(P2,A) Throughput(P1+P2,A) = 1 1 Throughput P1,A + 1 Throughput P2,A Three averaging techniques: Arithmetic : (1/N) * P=1..N Latency(P) For times: units proportional to time (e.g., latency) Harmonic : N / P=1..N 1/Throughput(P) For rates: units inversely proportional to time (e.g., throughput) (nless time is fixed) Geometric : N P=1..N Speedup(P) For ratios: unitless quantities (e.g., speedups) Slide 11

12 The Iron Law of Processor Performance Time Processor Performance = Program Instructions Cycles Time = Program Instruction Cycle (code size) (CPI) (cycle time) Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer Slide 12

13 Danger: Partial Performance etrics icro-architects often ignore dynamic instruction count Typically work in one ISA/one compiler treat it as fixed Iron law reduces to seconds / instruction = (cycles / instruction) * (seconds / cycle) IPS (millions of instructions per second) Instructions / second * 1-6 Cycles / second: clock frequency (in Hz) Example: CPI = 2, clock = 5 Hz, what is IPS?.5 * 5 Hz * 1-6 = 25 IPS Problems: compiler removes instructions, program faster However, IPS goes down (misleading) Slide 13

14 Danger: Partial Performance etrics II icro-architects often ignore instructions/program but general public (mostly) also ignores CPI Equates clock frequency with performance!! Which processor would you buy? Processor A: CPI = 2, clock = 5 Hz Processor B: CPI = 1, clock = 3 Hz Probably A, but B is faster (assuming same ISA/compiler) Classic example 8 Hz Pentium III faster than 1 GHz Pentium 4 Same ISA and compiler Slide 14

15 Performance Key Points Amdahl s law S overall = 1 Iron law Time Program 1 f + f S Instructions Program Cycles Instruction Time Cycle Averaging Techniques Arithmetic Time Harmonic Rates Geometric Ratios 1 n i 1Timei n n i 1 n 1 n Rate i n Ratio i i 1 Slide 15

16 Instruction Set Architecture Slide 16

17 Instruction Set Architecture Instruction set architecture (ISA) is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine IB introducing 36 in IB 36 is a family of binary-compatible machines with distinct microarchitectures and technologies, ranging from odel 3 (8-bit datapath, up to 64KB memory) to odel 7 (64-bit datapath, 512KB memory) and later odel 36/91 (the Tomasulo). - IB 36 replaced 4 concurrent, but incompatible lines of IB architectures developed over the previous 1 years Slide 17

18 ISA: A contract between HW and SW ISA (instruction set architecture) A well-defined hardware/software interface The contract between software and hardware Functional definition of operations, modes, and storage locations supported by hardware Precise description of how to invoke, and access them No guarantees regarding How operations are implemented Which operations are fast and which are slow and when Which operations take more power and which take less Slide 18

19 von Neumann odel of a Computer Key idea: emory contains both instructions and data Instructions can be operated on as if they are data Self-modifying code mostly discouraged now But compilers take as input a program and produce another program! Turing machines are vn machines Slide 19

20 Sequential odel of Computing Each instruction is executed one after the other Branch instructions can change this done conditionally Tied to a program counter The microarchitectures that we will study conform to the sequential execution model but under the hood they execute instructions out-of-order (OoO) Other models? Dataflow? Slide 2

21 Components of an ISA Programmer-visible states Program counter, general purpose registers, memory, control registers Programmer-visible behaviors (state transitions) What to do, when to do it Example register-transfer-level description of an instruction if imem[pc]== add rd, rs, rt then pc pc+1 gpr[rd]=gpr[rs]+grp[rt] A binary encoding ISAs last 25+ years (because of SW cost) be careful what goes in Slide 21

22 RISC vs CISC Recall Iron law: (instructions/program) * (cycles/instruction) * (seconds/cycle) CISC (Complex Instruction Set Computing) Improve instructions/program with complex instructions Easy for assembly-level programmers, good code density RISC (Reduced Instruction Set Computing) Improve cycles/instruction with many single-cycle instructions Increases instruction/program, but hopefully not as much Help from smart compiler Perhaps improve clock cycle time (seconds/cycle) via aggressive implementation allowed by simpler instructions Slide 22

23 What akes a Good ISA? Programmability Easy to express programs efficiently? Implementability Easy to design high-performance implementations? ore recently Easy to design low-power implementations? Easy to design high-reliability implementations? Easy to design low-cost implementations? Compatibility Easy to maintain programmability (implementability) as languages and programs (technology) evolves? x86 (IA32) generations: 886, 286, 386, 486, Pentium, PentiumII, PentiumIII, Pentium4, Slide 23

24 Type Typical Instructions (Opcodes) Arithmetic and logical Data transfer Control System Floating point Decimal String Example Instruction and, add move, load branch, jump, call, return trap, rett add, mul, div, sqrt addd, convert move, compare What operations are necessary? {sub, ld & st, conditional br.} What is the minimum complete ISA for a von Neuman machine? Too little or too simple not expressive enough difficult to program (by hand) programs tend to be bigger Too much or too complex most of it won t be used too much baggage for implementation. difficult choices during compiler optimization Slide 24

25 Power Slide 25

26 Introduction Why is power a problem in a μp? Power used by the μp, vs. system power Dissipating Heat elting (very bad) Packaging (to cool $) Heat leads to poorer performance. Providing Power Battery Cost of electricity Slide 26

27 Where does the juice go in laptops? Others have measured ~55% processor increase under max load in laptops [Hsu+Kremer, 22] Slide 27

28 What about servers? SunFire T2 DRA >2%; growing 2% CP <25%; shrinking 23% 2% 4% 1% 9% 14% AC to DC only 6-9% efficient Processor emory I/O Disk Services Fans AC/DC Conversion Need whole-system approaches to save energy Slide 28

29 Why is power a problem? Why worry about power dissipation? Battery life Thermal issues: affect cooling, packaging, reliability, timing Environment Slide 29

30 Why is power a problem? Total Power Dissipation Trends Power Density (W/cm 2 ) Nuclear Reactor Pentium 4 (Prescott) Pentium 4 Hot Plate Pentium 3 Pentium 2 Pentium Pro Pentium Slide 3

31 Why is power a problem? Spot Heat Issues in icroprocessors Slide 31

32 Why is power a problem? Packaging cost Complex and expensive (note heatpipe) Source: H. ie et al. Packaging the Itanium icroprocessor Electronic Components and Technology Conference 22 Slide 32

33 Temperature/di-dt-Constrained Power-Aware Computing Applications Energy-Constrained Computing Slide 33

34 CO2 Emissions (mil. metric tons) Data center energy use Installed base grows 11%/yr. By 211, 2.5% of S energy $7.4 billion/yr. Source: S EPA Source: ankoff et al, IEEE Computer th 34th.5% of world CO 2 emissions; rivals entire Czech Republic Improving energy efficiency is a critical challenge Nigeria Data Centers Czech Republic Slide 34

35 Where does all the power go? 38% 5% 4% 1% 52% IT Equipment Cooling PS Power Delivery Lighting Source: Liebert 27 Servers account for barely half of power 1W of cooling per 1.5W of IT load 1W data center: cooling costs $4 to $8 / yr. System designers must think about cooling Slide 35

36 Why is power a problem? Power-Aware Needed across all computing platforms obile/portable (cell phones, laptops, PDA) Battery life is critical Desktops/Set-Top (PCs and game machines) Packaging cost is critical Servers (ainframes and compute-farms) Packaging limits Volumetric (performance density) Slide 36

37 What uses power in a chip Slide 37

38 What uses power in a chip? How COS Transistors Work Slide 38

39 What uses power in a chip? OS Transistors are Switches Slide 39

40 What uses power in a chip? Power: The Basics Dynamic power vs. Static power Dynamic: switching power Static: leakage power Dynamic power dominates, but static power increasing in importance Static power: steady, per-cycle energy cost Dynamic power: capacitive and short-circuit Capacitive power: charging/discharging at transitions from 1 and 1 Short-circuit power: power due to brief short-circuit current during transitions. Slide 4

41 What uses power in a chip? Dynamic (Capacitive) Power Dissipation I V IN V OT C L Data dependent a function of switching activity Slide 41

42 What uses power in a chip? Capacitive Power dissipation Capacitance: Function of wire length, transistor size Power ~ ½ CV 2 Af Activity factor: How often, on average, do wires switch? Supply Voltage: Has been dropping with successive fab generations Clock frequency: Increasing Slide 42

43 What uses power in a chip? Power vs. Energy Power consumption in Watts Determines battery life in hours Sets packaging limits Energy efficiency in joules Rate at which energy is consumed over time Energy = power * delay (joules = watts * seconds) Lower energy number means less power to perform a computation at same frequency Slide 43

44 What uses power in a chip? Power vs. Energy Slide 44

45 Energy vs Power What are three computing applications where we care about energy more than power? What about power over energy? Slide 45

46 What uses power in a chip? Voltage Scaling Scenario: 8W, 1 BIPS, 1.5V, 1GHz Cache Optimization: IPC decreases by 1%, reduces power by 2% => Final Processor: 9 IPS, 64W What if we just adjust frequency/voltage on processor? How to reduce power by 2%? P = CV 2 F = CV 3 => Drop voltage by 7% (and also Freq) =>.93*.93*.93 =.8x So for equal power (64W) Cache Optimization = 9IPS Simple Voltage/Frequency Scaling = 93IPS Slide 46

47 Power scales roughly cubically with frequency Scale clock frequency to 8% Now add a second core ulticore: Solution to Power-constrained design? Same power budget, but 1.6x performance! But: ust parallelize application Remember Amdahl s Law! Performance Power Slide 47

48 The Execution Core: Pipelining Slide 48

49 Outline: nderstanding the Execution Core s 5-stage pipeline (review) 2. Implementing pipeline interlocks (review) 3. Scoreboard scheduling (CDC 66) 4. Tomasulo s OoO scheduling algorithm (IB 36) 5. Precise interrupts with a Reorder Buffer (P6, Core) 6. odern OoO (IPS R1K, Alpha 21264, Netburst) Slide 49

50 Single-cycle ulti-cycle Before there was pipelining insn.fetch, dec, exec insn1.fetch, dec, exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Basic datapath: fetch, decode, execute Single-cycle control: hardwired + Low CPI (1) Long clock period (to accommodate slowest instruction) ulti-cycle control: micro-programmed + Short clock period High CPI Slide 5

51 Single-cycle ulti-cycle insn.fetch, dec, exec Speeding p Remember, three ways to speed up a process: Reduce number of tasks (possible?) Decrease latency of tasks (possible?) Parallelize How do we parallelize this pipeline? insn1.fetch, dec, exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Slide 51

52 Parallelize insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Duplicate pipeline (superscalar) Effective, but expensive (>2x hardware overhead) Discuss more later in semester insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Or pipeline! Slide 52

53 ulti-cycle Pipelined Pipelining insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Important performance technique Improves throughput at the expense of latency Why does latency go up? Begin with multi-cycle design When instruction advances from stage 1 to 2 allow next instruction to enter stage 1 Each instruction still passes through all stages + But instructions enter and leave at a much faster rate Not much hardware overhead (what needs to be added?) Slide 53

54 Pipeline Illustrated: L Comb. Logic n Gate Delay BW = ~(1/n) L n -- 2 Gate Delay L n -- 2 Gate Delay BW = ~(2/n) L n -- Gate 3 Delay L n -- Gate 3Delay L n -- Gate 3 Delay BW = ~(3/n) Slide 54

55 37 Processor Pipeline Review Fetch Decode Execute emory (Write-back) +4 PC I-cache Reg File AL D-cache T pipeline = T base / 5 Slide 55

56 Stage 1: Fetch Fetch an instruction from memory every cycle. se PC to index memory Increment PC (assume no branches for now) Write state to the pipeline register (IF/ID) The next stage will read this pipeline register. Note that pipeline register must be edge triggered Slide 56

57 Instruction bits PC + 1 Rest of pipelined datapath 1 + en PC Instruction emory/ Cache en IF / ID Pipeline register Slide 57

58 Stage 2: Decode Decodes opcode bits ay set up control signals for later stages Read input operands from registers file specified by rega and regb of instruction bits Write state to the pipeline register (ID/E) Opcode Register contents Offset & destination fields PC+1 (even though decode didn t use it) Slide 58

59 Instruction bits Control Signals Stage 1: Fetch datapath Contents Of regb PC + 1 Contents Of rega Rest of pipelined datapath PC + 1 rega regb Destreg Register File Data en IF / ID Pipeline register ID / E Pipeline register Slide 59

60 Stage 3: Execute Perform AL operation. Input operands can be: Contents of rega or RegB Offset field on the instruction Branches: calculate PC+1+offset Write state to the pipeline register (E/em) AL result, contents of RegB and PC+1+offset Instruction bits for opcode and destreg specifiers Slide 6

61 Control Signals Control Signals Stage 2: Decode datapath Contents Of regb contents of regb Contents Of rega AL Result Rest of pipelined datapath PC + 1 PC+1 +offset + A L ID / E Pipeline register E/em Pipeline register Slide 61

62 Stage 4: emory Operation Perform data cache access for memory ops AL result contains address for ld and st Opcode bits control mem R/W and enable signals Write state to the pipeline register (em/wb) AL result and emdata Instruction bits for opcode and destreg specifiers Slide 62

63 Control Signals Control Signals Stage 3: Execute datapath contents of regb emory Read Data Alu Result Alu Result Rest of pipelined datapath PC+1 +offset This goes back to the before the PC in stage 1. control for PC input Data emory en R/W E/em Pipeline register em/wb Pipeline register Slide 63

64 Stage 5: Write back Writing result to register file (if required) Write emdata to destreg for ld instruction Write AL result to destreg for arithmetic instruction Opcode bits control register write enable signal Slide 64

65 Control Signals Stage 4: emory datapath emory Read Data Alu Result This goes back to data input of register file em/wb Pipeline register register write enable This goes back to the destination register specifier bits -2 bits Slide 65

66 Sample Code (Simple) Run the following code on a pipelined datapath: add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg 4 = em[reg2+2] add ; reg 5 = reg 2 + reg 5 sw ; em[reg3+1] =reg 7 Slide 66

67 Slide 67 PC Inst mem Register file A L 1 Data memory + + IF/ ID ID/ E E/ em em/ WB Bits -2 Bits op dest offset valb vala PC+1 PC+1 target AL result op dest valb op dest AL result mdata eq? instruction R2 R3 R4 R5 R1 R6 R R7 rega regb Bits data dest

68 Slide 68 PC Inst mem Register file A L 1 Data memory + + IF/ ID ID/ E E/ em em/ WB Bits -2 Bits noop noop noop noop R2 R3 R4 R5 R1 R6 R R7 Bits data dest Initial State

69 Register file add PC 1 + Inst mem Fetch: add Time: 1 1 add IF/ ID Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R noop ID/ E + A L noop E/ em Data memory noop em/ WB data dest Slide 69

70 Register file nand add PC 1 + Inst mem Fetch: nand Time: 2 2 nand IF/ ID 1 2 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R add ID/ E + A L noop E/ em Data memory noop em/ WB data dest Slide 7

71 Register file lw nand add PC 1 + Inst mem Fetch: lw Time: 3 3 lw IF/ ID 4 5 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R nand ID/ E A L add E/ em Data memory noop em/ WB data dest Slide 71

72 Register file add lw nand add PC 1 + Inst mem Fetch: add Time: 4 4 add IF/ ID 2 4 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R lw ID/ E A L nand E/ em 45 3 Data memory 45 3 add em/ WB data dest Slide 72

73 Register file sw add lw nand add PC 1 + Inst mem Fetch: sw Time: 5 5 sw IF/ ID 2 5 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R add ID/ E A L lw E/ em -3 6 Data memory -3 6 nand 45 3 em/ WB data dest Slide 73

74 Register file sw add lw nand PC 1 + Inst mem No more instructions Time: 6 IF/ ID 3 7 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R sw ID/ E A L add E/ em 29 4 Data memory lw -3 6 em/ WB data dest Slide 74

75 Register file sw add lw PC 1 + Inst mem No more instructions Time: 7 IF/ ID Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R ID/ E A L sw E/ em 16 5 Data memory 16 5 add 99 4 em/ WB data dest Slide 75

76 Register file sw add PC 1 + Inst mem R R1 R2 R3 R4 R5 R6 R A L Data memory data dest No more instructions Time: 8 IF/ ID Bits -2 Bits Bits ID/ E E/ em 7 sw 5 em/ WB Slide 76

77 Register file sw PC 1 + Inst mem R R1 R2 R3 R4 R5 R6 R A L Data memory data dest No more instructions Bits -2 Bits Bits Time: 9 IF/ ID ID/ E E/ em em/ WB Slide 77

78 Time graphs Time: add fetch decode execute memory writeback nand fetch decode execute memory writeback lw fetch decode execute memory writeback add fetch decode execute memory writeback sw fetch decode execute memory writeb Slide 78

EECS 470. Further review: Pipeline Hazards and More. Lecture 2 Winter 2018

EECS 470. Further review: Pipeline Hazards and More. Lecture 2 Winter 2018 EECS 470 Further review: Pipeline Hazards and ore Lecture 2 Winter 208 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar,

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 2: Figures of Merit and Evaluation Methodologies Instructor: Ron Dreslinski Winter 2016 1 1 Measuring performance 2 2 Performance

More information

Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1

Pipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1 Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs4

More information

EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont

EECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,

More information

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance

This Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance This Unit CIS 501 Computer Architecture Metrics Latency and throughput Reporting performance Benchmarking and averaging Unit 2: Performance Performance analysis & pitfalls Slides developed by Milo Martin

More information

EECS 470 Lecture 1. Computer Architecture Winter 2014

EECS 470 Lecture 1. Computer Architecture Winter 2014 EECS 470 Lecture 1 Computer Architecture Winter 2014 Slides developed in part by Profs. Brehob, Austin, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch 1 What Is Computer

More information

EECS 470 Lecture 1. Computer Architecture. Winter 2019 Prof. Ron Dreslinski h6p://

EECS 470 Lecture 1. Computer Architecture. Winter 2019 Prof. Ron Dreslinski h6p:// Computer Architecture Winter 2019 Prof. Ron Dreslinski h6p://www.eecs.umich.edu/courses/eecs470/ Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith,

More information

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance Complex question How fast is the processor? How fast your application runs?

More information

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional

More information

Pipeline design. Mehran Rezaei

Pipeline design. Mehran Rezaei Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We

More information

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned

More information

(Basic) Processor Pipeline

(Basic) Processor Pipeline (Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might

More information

EECS 470. Control Hazards and ILP. Lecture 3 Winter 2014

EECS 470. Control Hazards and ILP. Lecture 3 Winter 2014 EECS 470 Control Hazards and ILP Lecture 3 Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Agenda. Recap: Adding branches to datapath. Adding jalr to datapath. CS 61C: Great Ideas in Computer Architecture

Agenda. Recap: Adding branches to datapath. Adding jalr to datapath. CS 61C: Great Ideas in Computer Architecture /5/7 CS 6C: Great Ideas in Computer Architecture Lecture : Control & Operating Speed Krste Asanović & Randy Katz http://insteecsberkeleyedu/~cs6c/fa7 CS 6c Lecture : Control & Performance Recap: Adding

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement

More information

Fundamentals of Computer Design

Fundamentals of Computer Design CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

EECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018

EECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19 CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Review: latency vs. throughput

Review: latency vs. throughput Lecture : Performance measurement and Instruction Set Architectures Last Time Introduction to performance Computer benchmarks Amdahl s law Today Take QUIZ 1 today over Chapter 1 Turn in your homework on

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

EECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont

EECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont Basic Caches Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

are Softw Instruction Set Architecture Microarchitecture are rdw

are Softw Instruction Set Architecture Microarchitecture are rdw Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

ECE 587 Advanced Computer Architecture I

ECE 587 Advanced Computer Architecture I ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:

More information

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6 RISC Pipeline Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6 A Processor memory inst register file alu PC +4 +4 new pc offset target imm control extend =? cmp

More information

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin.   School of Information Science and Technology SIST CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

ECE260: Fundamentals of Computer Engineering

ECE260: Fundamentals of Computer Engineering Datapath for a Simplified Processor James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Introduction

More information

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era

More information

ECE 4750 Computer Architecture, Fall 2014 T01 Single-Cycle Processors

ECE 4750 Computer Architecture, Fall 2014 T01 Single-Cycle Processors ECE 4750 Computer Architecture, Fall 2014 T01 Single-Cycle Processors School of Electrical and Computer Engineering Cornell University revision: 2014-09-03-17-21 1 Instruction Set Architecture 2 1.1. IBM

More information

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang

More information

Wide Instruction Fetch

Wide Instruction Fetch Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register

More information

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont Lecture 16 Virtual Memory Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and

More information

EECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018

EECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018 EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 0 J A N L E M E I R E Course Objectives 2 Intel 4004 1971 2.3K trans. Intel Core 2 Duo 2006 291M trans. Where have all the transistors gone? Turing Machine

More information

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures

Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures ISA-level Tradeoffs: Instruction Length Fixed length: Length of all instructions the same + Easier to decode single instruction

More information

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic

More information

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yanjing Li Department of Computer Science University of Chicago Administrative Stuff! Lab1 is out! " Due next Thursday (10/6)! Lab2 " Out next Thursday

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table

EECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont.   History Table. Correlating Prediction Table Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Pipelining: Basic Concepts

Pipelining: Basic Concepts Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation

More information

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies. Introduction to Computer Architecture Caches and emory Hierarchies Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) and Alvin Lebeck (Duke) Spring 2012 Where

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing

More information

CSEE 3827: Fundamentals of Computer Systems

CSEE 3827: Fundamentals of Computer Systems CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23) Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

ECE 154A Introduction to. Fall 2012

ECE 154A Introduction to. Fall 2012 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

CSE140: Components and Design Techniques for Digital Systems

CSE140: Components and Design Techniques for Digital Systems CSE4: Components and Design Techniques for Digital Systems Tajana Simunic Rosing Announcements and Outline Check webct grades, make sure everything is there and is correct Pick up graded d homework at

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:

More information