EECS 470 Lecture 2. Performance, Power & ISA. Fall Jon Beaumont
|
|
- Cathleen Singleton
- 5 years ago
- Views:
Transcription
1 Performance, Power & ISA Fall 218 Jon Beaumont Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, udge, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie ellon niversity, Purdue niversity, niversity of ichigan, niversity of Pennsylvania, and niversity of Wisconsin. Slide 1
2 Warm p Riddle A car must drive 2 miles. It drives with an average speed of V 1 the first mile. How fast must it travel during the second mile so that its total average speed is twice that of the first mile (i.e. V Total =2*V 1 )? (Vote here: etc.ch/zwn) a) b) ½ V 1 c) 2 V 1 d) 4 V 1 e) Other Slide 2
3 Class logistics Last Time Discussed high level goals of computer architecture Performance Power Cost, security, ease of programmability, etc. Discussed how to increase program performance ostly through adding parallelism Limits of parallelism Amdahl s Law Slide 3
4 Today Dive into performance metrics a bit more Quantifying performance (throughput and latency) Discuss arithmetic of averages ISA overview Von Neumann architecture CISC vs RISC Power and Energy Start on 5-stage processor and pipeline review Slide 4
5 Administrative Lab 1 due Thursday at 4:29pm Check off with GSI in OH Project 1 due Saturday at 11:59pm 9 submissions so far Don t leave to the last minute HW 1 due next Tuesday (9/18) at 11:59pm Submit to Gradescope (see website) Should cover all material by Wednesday Everyone have access to Canvas/Piazza/Gradescope? Do I have everyone s picture? Slide 5
6 Performance and Power Trends Source: Chris Batten Dissertation, IT (21) Slide 6
7 Performance Two definitions Latency (execution time): time to finish a fixed task Throughput (bandwidth): number of tasks in fixed time Very different: throughput can exploit parallelism, latency can t Baking bread analogy Often contradictory Choose definition to match measurement goals Example: move people from A to B, 1 miles Car: capacity = 5, speed = 6 miles/hour Bus: capacity = 6, speed = 2 miles/hour Latency: car = 1 min, bus = 3 min Throughput: car = 3 PPH, bus = 12 PPH Slide 7
8 Performance Improvement Processor A is times faster than processor B if Latency(P,A) = Latency(P,B) / Throughput(P,A) = Throughput(P,B) * Processor A is % faster than processor B if Latency(P,A) = Latency(P,B) / (1+/1) Throughput(P,A) = Throughput(P,B) * (1+/1) Car/bus example Latency? Car is 3 times (and 2%) faster than bus Throughput? Bus is 4 times (and 3%) faster than car Slide 8
9 Latency vs Throughput What are three computing applications where we care mostly about throughput? What about latency? Slide 9
10 Averaging Performance Numbers I You can add latencies, but not throughput Latency(P1+P2, A) = Latency(P1,A) + Latency(P2,A) Throughput(P1+P2,A)!= Throughput(P1,A) + Throughput(P2,A) E.g., 1 3 miles/hour miles/hour Average is not 6 miles/hour.33 hours at 3 miles/hour +.1 hours at 9 miles/hour Average is only 47 miles/hour! (2 miles / ( hours)) Slide 1
11 Averaging Performance Numbers II Latency(P1+P2, A) = Latency(P1,A) + Latency(P2,A) Throughput(P1+P2,A) = 1 1 Throughput P1,A + 1 Throughput P2,A Three averaging techniques: Arithmetic : (1/N) * P=1..N Latency(P) For times: units proportional to time (e.g., latency) Harmonic : N / P=1..N 1/Throughput(P) For rates: units inversely proportional to time (e.g., throughput) (nless time is fixed) Geometric : N P=1..N Speedup(P) For ratios: unitless quantities (e.g., speedups) Slide 11
12 The Iron Law of Processor Performance Time Processor Performance = Program Instructions Cycles Time = Program Instruction Cycle (code size) (CPI) (cycle time) Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer Slide 12
13 Danger: Partial Performance etrics icro-architects often ignore dynamic instruction count Typically work in one ISA/one compiler treat it as fixed Iron law reduces to seconds / instruction = (cycles / instruction) * (seconds / cycle) IPS (millions of instructions per second) Instructions / second * 1-6 Cycles / second: clock frequency (in Hz) Example: CPI = 2, clock = 5 Hz, what is IPS?.5 * 5 Hz * 1-6 = 25 IPS Problems: compiler removes instructions, program faster However, IPS goes down (misleading) Slide 13
14 Danger: Partial Performance etrics II icro-architects often ignore instructions/program but general public (mostly) also ignores CPI Equates clock frequency with performance!! Which processor would you buy? Processor A: CPI = 2, clock = 5 Hz Processor B: CPI = 1, clock = 3 Hz Probably A, but B is faster (assuming same ISA/compiler) Classic example 8 Hz Pentium III faster than 1 GHz Pentium 4 Same ISA and compiler Slide 14
15 Performance Key Points Amdahl s law S overall = 1 Iron law Time Program 1 f + f S Instructions Program Cycles Instruction Time Cycle Averaging Techniques Arithmetic Time Harmonic Rates Geometric Ratios 1 n i 1Timei n n i 1 n 1 n Rate i n Ratio i i 1 Slide 15
16 Instruction Set Architecture Slide 16
17 Instruction Set Architecture Instruction set architecture (ISA) is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine IB introducing 36 in IB 36 is a family of binary-compatible machines with distinct microarchitectures and technologies, ranging from odel 3 (8-bit datapath, up to 64KB memory) to odel 7 (64-bit datapath, 512KB memory) and later odel 36/91 (the Tomasulo). - IB 36 replaced 4 concurrent, but incompatible lines of IB architectures developed over the previous 1 years Slide 17
18 ISA: A contract between HW and SW ISA (instruction set architecture) A well-defined hardware/software interface The contract between software and hardware Functional definition of operations, modes, and storage locations supported by hardware Precise description of how to invoke, and access them No guarantees regarding How operations are implemented Which operations are fast and which are slow and when Which operations take more power and which take less Slide 18
19 von Neumann odel of a Computer Key idea: emory contains both instructions and data Instructions can be operated on as if they are data Self-modifying code mostly discouraged now But compilers take as input a program and produce another program! Turing machines are vn machines Slide 19
20 Sequential odel of Computing Each instruction is executed one after the other Branch instructions can change this done conditionally Tied to a program counter The microarchitectures that we will study conform to the sequential execution model but under the hood they execute instructions out-of-order (OoO) Other models? Dataflow? Slide 2
21 Components of an ISA Programmer-visible states Program counter, general purpose registers, memory, control registers Programmer-visible behaviors (state transitions) What to do, when to do it Example register-transfer-level description of an instruction if imem[pc]== add rd, rs, rt then pc pc+1 gpr[rd]=gpr[rs]+grp[rt] A binary encoding ISAs last 25+ years (because of SW cost) be careful what goes in Slide 21
22 RISC vs CISC Recall Iron law: (instructions/program) * (cycles/instruction) * (seconds/cycle) CISC (Complex Instruction Set Computing) Improve instructions/program with complex instructions Easy for assembly-level programmers, good code density RISC (Reduced Instruction Set Computing) Improve cycles/instruction with many single-cycle instructions Increases instruction/program, but hopefully not as much Help from smart compiler Perhaps improve clock cycle time (seconds/cycle) via aggressive implementation allowed by simpler instructions Slide 22
23 What akes a Good ISA? Programmability Easy to express programs efficiently? Implementability Easy to design high-performance implementations? ore recently Easy to design low-power implementations? Easy to design high-reliability implementations? Easy to design low-cost implementations? Compatibility Easy to maintain programmability (implementability) as languages and programs (technology) evolves? x86 (IA32) generations: 886, 286, 386, 486, Pentium, PentiumII, PentiumIII, Pentium4, Slide 23
24 Type Typical Instructions (Opcodes) Arithmetic and logical Data transfer Control System Floating point Decimal String Example Instruction and, add move, load branch, jump, call, return trap, rett add, mul, div, sqrt addd, convert move, compare What operations are necessary? {sub, ld & st, conditional br.} What is the minimum complete ISA for a von Neuman machine? Too little or too simple not expressive enough difficult to program (by hand) programs tend to be bigger Too much or too complex most of it won t be used too much baggage for implementation. difficult choices during compiler optimization Slide 24
25 Power Slide 25
26 Introduction Why is power a problem in a μp? Power used by the μp, vs. system power Dissipating Heat elting (very bad) Packaging (to cool $) Heat leads to poorer performance. Providing Power Battery Cost of electricity Slide 26
27 Where does the juice go in laptops? Others have measured ~55% processor increase under max load in laptops [Hsu+Kremer, 22] Slide 27
28 What about servers? SunFire T2 DRA >2%; growing 2% CP <25%; shrinking 23% 2% 4% 1% 9% 14% AC to DC only 6-9% efficient Processor emory I/O Disk Services Fans AC/DC Conversion Need whole-system approaches to save energy Slide 28
29 Why is power a problem? Why worry about power dissipation? Battery life Thermal issues: affect cooling, packaging, reliability, timing Environment Slide 29
30 Why is power a problem? Total Power Dissipation Trends Power Density (W/cm 2 ) Nuclear Reactor Pentium 4 (Prescott) Pentium 4 Hot Plate Pentium 3 Pentium 2 Pentium Pro Pentium Slide 3
31 Why is power a problem? Spot Heat Issues in icroprocessors Slide 31
32 Why is power a problem? Packaging cost Complex and expensive (note heatpipe) Source: H. ie et al. Packaging the Itanium icroprocessor Electronic Components and Technology Conference 22 Slide 32
33 Temperature/di-dt-Constrained Power-Aware Computing Applications Energy-Constrained Computing Slide 33
34 CO2 Emissions (mil. metric tons) Data center energy use Installed base grows 11%/yr. By 211, 2.5% of S energy $7.4 billion/yr. Source: S EPA Source: ankoff et al, IEEE Computer th 34th.5% of world CO 2 emissions; rivals entire Czech Republic Improving energy efficiency is a critical challenge Nigeria Data Centers Czech Republic Slide 34
35 Where does all the power go? 38% 5% 4% 1% 52% IT Equipment Cooling PS Power Delivery Lighting Source: Liebert 27 Servers account for barely half of power 1W of cooling per 1.5W of IT load 1W data center: cooling costs $4 to $8 / yr. System designers must think about cooling Slide 35
36 Why is power a problem? Power-Aware Needed across all computing platforms obile/portable (cell phones, laptops, PDA) Battery life is critical Desktops/Set-Top (PCs and game machines) Packaging cost is critical Servers (ainframes and compute-farms) Packaging limits Volumetric (performance density) Slide 36
37 What uses power in a chip Slide 37
38 What uses power in a chip? How COS Transistors Work Slide 38
39 What uses power in a chip? OS Transistors are Switches Slide 39
40 What uses power in a chip? Power: The Basics Dynamic power vs. Static power Dynamic: switching power Static: leakage power Dynamic power dominates, but static power increasing in importance Static power: steady, per-cycle energy cost Dynamic power: capacitive and short-circuit Capacitive power: charging/discharging at transitions from 1 and 1 Short-circuit power: power due to brief short-circuit current during transitions. Slide 4
41 What uses power in a chip? Dynamic (Capacitive) Power Dissipation I V IN V OT C L Data dependent a function of switching activity Slide 41
42 What uses power in a chip? Capacitive Power dissipation Capacitance: Function of wire length, transistor size Power ~ ½ CV 2 Af Activity factor: How often, on average, do wires switch? Supply Voltage: Has been dropping with successive fab generations Clock frequency: Increasing Slide 42
43 What uses power in a chip? Power vs. Energy Power consumption in Watts Determines battery life in hours Sets packaging limits Energy efficiency in joules Rate at which energy is consumed over time Energy = power * delay (joules = watts * seconds) Lower energy number means less power to perform a computation at same frequency Slide 43
44 What uses power in a chip? Power vs. Energy Slide 44
45 Energy vs Power What are three computing applications where we care about energy more than power? What about power over energy? Slide 45
46 What uses power in a chip? Voltage Scaling Scenario: 8W, 1 BIPS, 1.5V, 1GHz Cache Optimization: IPC decreases by 1%, reduces power by 2% => Final Processor: 9 IPS, 64W What if we just adjust frequency/voltage on processor? How to reduce power by 2%? P = CV 2 F = CV 3 => Drop voltage by 7% (and also Freq) =>.93*.93*.93 =.8x So for equal power (64W) Cache Optimization = 9IPS Simple Voltage/Frequency Scaling = 93IPS Slide 46
47 Power scales roughly cubically with frequency Scale clock frequency to 8% Now add a second core ulticore: Solution to Power-constrained design? Same power budget, but 1.6x performance! But: ust parallelize application Remember Amdahl s Law! Performance Power Slide 47
48 The Execution Core: Pipelining Slide 48
49 Outline: nderstanding the Execution Core s 5-stage pipeline (review) 2. Implementing pipeline interlocks (review) 3. Scoreboard scheduling (CDC 66) 4. Tomasulo s OoO scheduling algorithm (IB 36) 5. Precise interrupts with a Reorder Buffer (P6, Core) 6. odern OoO (IPS R1K, Alpha 21264, Netburst) Slide 49
50 Single-cycle ulti-cycle Before there was pipelining insn.fetch, dec, exec insn1.fetch, dec, exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Basic datapath: fetch, decode, execute Single-cycle control: hardwired + Low CPI (1) Long clock period (to accommodate slowest instruction) ulti-cycle control: micro-programmed + Short clock period High CPI Slide 5
51 Single-cycle ulti-cycle insn.fetch, dec, exec Speeding p Remember, three ways to speed up a process: Reduce number of tasks (possible?) Decrease latency of tasks (possible?) Parallelize How do we parallelize this pipeline? insn1.fetch, dec, exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Slide 51
52 Parallelize insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Duplicate pipeline (superscalar) Effective, but expensive (>2x hardware overhead) Discuss more later in semester insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Or pipeline! Slide 52
53 ulti-cycle Pipelined Pipelining insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec insn.fetch insn.dec insn.exec insn1.fetch insn1.dec insn1.exec Important performance technique Improves throughput at the expense of latency Why does latency go up? Begin with multi-cycle design When instruction advances from stage 1 to 2 allow next instruction to enter stage 1 Each instruction still passes through all stages + But instructions enter and leave at a much faster rate Not much hardware overhead (what needs to be added?) Slide 53
54 Pipeline Illustrated: L Comb. Logic n Gate Delay BW = ~(1/n) L n -- 2 Gate Delay L n -- 2 Gate Delay BW = ~(2/n) L n -- Gate 3 Delay L n -- Gate 3Delay L n -- Gate 3 Delay BW = ~(3/n) Slide 54
55 37 Processor Pipeline Review Fetch Decode Execute emory (Write-back) +4 PC I-cache Reg File AL D-cache T pipeline = T base / 5 Slide 55
56 Stage 1: Fetch Fetch an instruction from memory every cycle. se PC to index memory Increment PC (assume no branches for now) Write state to the pipeline register (IF/ID) The next stage will read this pipeline register. Note that pipeline register must be edge triggered Slide 56
57 Instruction bits PC + 1 Rest of pipelined datapath 1 + en PC Instruction emory/ Cache en IF / ID Pipeline register Slide 57
58 Stage 2: Decode Decodes opcode bits ay set up control signals for later stages Read input operands from registers file specified by rega and regb of instruction bits Write state to the pipeline register (ID/E) Opcode Register contents Offset & destination fields PC+1 (even though decode didn t use it) Slide 58
59 Instruction bits Control Signals Stage 1: Fetch datapath Contents Of regb PC + 1 Contents Of rega Rest of pipelined datapath PC + 1 rega regb Destreg Register File Data en IF / ID Pipeline register ID / E Pipeline register Slide 59
60 Stage 3: Execute Perform AL operation. Input operands can be: Contents of rega or RegB Offset field on the instruction Branches: calculate PC+1+offset Write state to the pipeline register (E/em) AL result, contents of RegB and PC+1+offset Instruction bits for opcode and destreg specifiers Slide 6
61 Control Signals Control Signals Stage 2: Decode datapath Contents Of regb contents of regb Contents Of rega AL Result Rest of pipelined datapath PC + 1 PC+1 +offset + A L ID / E Pipeline register E/em Pipeline register Slide 61
62 Stage 4: emory Operation Perform data cache access for memory ops AL result contains address for ld and st Opcode bits control mem R/W and enable signals Write state to the pipeline register (em/wb) AL result and emdata Instruction bits for opcode and destreg specifiers Slide 62
63 Control Signals Control Signals Stage 3: Execute datapath contents of regb emory Read Data Alu Result Alu Result Rest of pipelined datapath PC+1 +offset This goes back to the before the PC in stage 1. control for PC input Data emory en R/W E/em Pipeline register em/wb Pipeline register Slide 63
64 Stage 5: Write back Writing result to register file (if required) Write emdata to destreg for ld instruction Write AL result to destreg for arithmetic instruction Opcode bits control register write enable signal Slide 64
65 Control Signals Stage 4: emory datapath emory Read Data Alu Result This goes back to data input of register file em/wb Pipeline register register write enable This goes back to the destination register specifier bits -2 bits Slide 65
66 Sample Code (Simple) Run the following code on a pipelined datapath: add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg 4 = em[reg2+2] add ; reg 5 = reg 2 + reg 5 sw ; em[reg3+1] =reg 7 Slide 66
67 Slide 67 PC Inst mem Register file A L 1 Data memory + + IF/ ID ID/ E E/ em em/ WB Bits -2 Bits op dest offset valb vala PC+1 PC+1 target AL result op dest valb op dest AL result mdata eq? instruction R2 R3 R4 R5 R1 R6 R R7 rega regb Bits data dest
68 Slide 68 PC Inst mem Register file A L 1 Data memory + + IF/ ID ID/ E E/ em em/ WB Bits -2 Bits noop noop noop noop R2 R3 R4 R5 R1 R6 R R7 Bits data dest Initial State
69 Register file add PC 1 + Inst mem Fetch: add Time: 1 1 add IF/ ID Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R noop ID/ E + A L noop E/ em Data memory noop em/ WB data dest Slide 69
70 Register file nand add PC 1 + Inst mem Fetch: nand Time: 2 2 nand IF/ ID 1 2 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R add ID/ E + A L noop E/ em Data memory noop em/ WB data dest Slide 7
71 Register file lw nand add PC 1 + Inst mem Fetch: lw Time: 3 3 lw IF/ ID 4 5 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R nand ID/ E A L add E/ em Data memory noop em/ WB data dest Slide 71
72 Register file add lw nand add PC 1 + Inst mem Fetch: add Time: 4 4 add IF/ ID 2 4 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R lw ID/ E A L nand E/ em 45 3 Data memory 45 3 add em/ WB data dest Slide 72
73 Register file sw add lw nand add PC 1 + Inst mem Fetch: sw Time: 5 5 sw IF/ ID 2 5 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R add ID/ E A L lw E/ em -3 6 Data memory -3 6 nand 45 3 em/ WB data dest Slide 73
74 Register file sw add lw nand PC 1 + Inst mem No more instructions Time: 6 IF/ ID 3 7 Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R sw ID/ E A L add E/ em 29 4 Data memory lw -3 6 em/ WB data dest Slide 74
75 Register file sw add lw PC 1 + Inst mem No more instructions Time: 7 IF/ ID Bits -2 Bits Bits R R1 R2 R3 R4 R5 R6 R ID/ E A L sw E/ em 16 5 Data memory 16 5 add 99 4 em/ WB data dest Slide 75
76 Register file sw add PC 1 + Inst mem R R1 R2 R3 R4 R5 R6 R A L Data memory data dest No more instructions Time: 8 IF/ ID Bits -2 Bits Bits ID/ E E/ em 7 sw 5 em/ WB Slide 76
77 Register file sw PC 1 + Inst mem R R1 R2 R3 R4 R5 R6 R A L Data memory data dest No more instructions Bits -2 Bits Bits Time: 9 IF/ ID ID/ E E/ em em/ WB Slide 77
78 Time graphs Time: add fetch decode execute memory writeback nand fetch decode execute memory writeback lw fetch decode execute memory writeback add fetch decode execute memory writeback sw fetch decode execute memory writeb Slide 78
EECS 470. Further review: Pipeline Hazards and More. Lecture 2 Winter 2018
EECS 470 Further review: Pipeline Hazards and ore Lecture 2 Winter 208 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar,
More informationEECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 2: Figures of Merit and Evaluation Methodologies
1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 2: Figures of Merit and Evaluation Methodologies Instructor: Ron Dreslinski Winter 2016 1 1 Measuring performance 2 2 Performance
More informationPipelining & Hazards. Prof. Thomas Wenisch GAS STATION. Lecture 3 EECS 470. Slide 1
Wenisch 2 -- Portions Austin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar GAS STATION Pipelining & Hazards Fall 2 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs4
More informationEECS 470 Lecture 4. Pipelining & Hazards II. Fall 2018 Jon Beaumont
GAS STATION Pipelining & Hazards II Fall 208 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, artin, Roth, Shen, Smith,
More informationThis Unit. CIS 501 Computer Architecture. As You Get Settled. Readings. Metrics Latency and throughput. Reporting performance
This Unit CIS 501 Computer Architecture Metrics Latency and throughput Reporting performance Benchmarking and averaging Unit 2: Performance Performance analysis & pitfalls Slides developed by Milo Martin
More informationEECS 470 Lecture 1. Computer Architecture Winter 2014
EECS 470 Lecture 1 Computer Architecture Winter 2014 Slides developed in part by Profs. Brehob, Austin, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch 1 What Is Computer
More informationEECS 470 Lecture 1. Computer Architecture. Winter 2019 Prof. Ron Dreslinski h6p://
Computer Architecture Winter 2019 Prof. Ron Dreslinski h6p://www.eecs.umich.edu/courses/eecs470/ Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith,
More informationPerformance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]
Performance CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance Complex question How fast is the processor? How fast your application runs?
More information15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011
15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We
More information15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University
15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned
More information(Basic) Processor Pipeline
(Basic) Processor Pipeline Nima Honarmand Generic Instruction Life Cycle Logical steps in processing an instruction: Instruction Fetch (IF_STEP) Instruction Decode (ID_STEP) Operand Fetch (OF_STEP) Might
More informationEECS 470. Control Hazards and ILP. Lecture 3 Winter 2014
EECS 470 Control Hazards and ILP Lecture 3 Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationAgenda. Recap: Adding branches to datapath. Adding jalr to datapath. CS 61C: Great Ideas in Computer Architecture
/5/7 CS 6C: Great Ideas in Computer Architecture Lecture : Control & Operating Speed Krste Asanović & Randy Katz http://insteecsberkeleyedu/~cs6c/fa7 CS 6c Lecture : Control & Performance Recap: Adding
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationFundamentals of Computer Design
CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 7 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More information1.3 Data processing; data storage; data movement; and control.
CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationReview: latency vs. throughput
Lecture : Performance measurement and Instruction Set Architectures Last Time Introduction to performance Computer benchmarks Amdahl s law Today Take QUIZ 1 today over Chapter 1 Turn in your homework on
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationMIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14
MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK
More informationThese actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.
MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationEECS 470 Lecture 13. Basic Caches. Fall 2018 Jon Beaumont
Basic Caches Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of
More informationAdvanced issues in pipelining
Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationMultiple Issue ILP Processors. Summary of discussions
Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware
More informationare Softw Instruction Set Architecture Microarchitecture are rdw
Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics
More informationPerformance of computer systems
Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type
More informationECE 587 Advanced Computer Architecture I
ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationRISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6
RISC Pipeline Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6 A Processor memory inst register file alu PC +4 +4 new pc offset target imm control extend =? cmp
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationComputer Architecture s Changing Definition
Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture
More informationECE260: Fundamentals of Computer Engineering
Datapath for a Simplified Processor James Moscola Dept. of Engineering & Computer Science York College of Pennsylvania Based on Computer Organization and Design, 5th Edition by Patterson & Hennessy Introduction
More informationCSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era
More informationECE 4750 Computer Architecture, Fall 2014 T01 Single-Cycle Processors
ECE 4750 Computer Architecture, Fall 2014 T01 Single-Cycle Processors School of Electrical and Computer Engineering Cornell University revision: 2014-09-03-17-21 1 Instruction Set Architecture 2 1.1. IBM
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationHomework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures
Homework 5 Start date: March 24 Due date: 11:59PM on April 10, Monday night 4.1.1, 4.1.2 4.3 4.8.1, 4.8.2 4.9.1-4.9.4 4.13.1 4.16.1, 4.16.2 1 CSCI 402: Computer Architectures The Processor (4) Fengguang
More informationWide Instruction Fetch
Wide Instruction Fetch Fall 2007 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs470 edu/courses/eecs470 block_ids Trace Table pre-collapse trace_id History Br. Hash hist. Rename Fill Table
More informationHakim Weatherspoon CS 3410 Computer Science Cornell University
Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer. memory inst register
More informationEECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont
Lecture 16 Virtual Memory Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and
More informationEECS 470. Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018
EECS 470 Branches: Address prediction and recovery (And interrupt recovery too.) Lecture 6 Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen,
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationAdvanced Computer Architecture
Advanced Computer Architecture 1 L E C T U R E 0 J A N L E M E I R E Course Objectives 2 Intel 4004 1971 2.3K trans. Intel Core 2 Duo 2006 291M trans. Where have all the transistors gone? Turing Machine
More informationLecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures
Lecture 4: ISA Tradeoffs (Continued) and Single-Cycle Microarchitectures ISA-level Tradeoffs: Instruction Length Fixed length: Length of all instructions the same + Easier to decode single instruction
More informationMark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control
EE 37 Unit Single-Cycle CPU path and Control CPU Organization Scope We will build a CPU to implement our subset of the MIPS ISA Memory Reference Instructions: Load Word (LW) Store Word (SW) Arithmetic
More informationCMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago
CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yanjing Li Department of Computer Science University of Chicago Administrative Stuff! Lab1 is out! " Due next Thursday (10/6)! Lab2 " Out next Thursday
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 13 Project Introduction You will design and optimize a RISC-V processor Phase 1: Design
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationLec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationEECS 470. Lecture 15. Prefetching. Fall 2018 Jon Beaumont. History Table. Correlating Prediction Table
Lecture 15 History Table Correlating Prediction Table Prefetching Latest A0 A0,A1 A3 11 Fall 2018 Jon Beaumont A1 http://www.eecs.umich.edu/courses/eecs470 Prefetch A3 Slides developed in part by Profs.
More information14:332:331 Pipelined Datapath
14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationPipelining: Basic Concepts
Pipelining: Basic Concepts Prof. Cristina Silvano Dipartimento di Elettronica e Informazione Politecnico di ilano email: silvano@elet.polimi.it Outline Reduced Instruction Set of IPS Processor Implementation
More informationWhere We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.
Introduction to Computer Architecture Caches and emory Hierarchies Copyright 2012 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) and Alvin Lebeck (Duke) Spring 2012 Where
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University. P & H Chapter 4.10, 1.7, 1.8, 5.10, 6
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University P & H Chapter 4.10, 1.7, 1.8, 5.10, 6 Why do I need four computing cores on my phone?! Why do I need eight computing
More informationCSEE 3827: Fundamentals of Computer Systems
CSEE 3827: Fundamentals of Computer Systems Lecture 15 April 1, 2009 martha@cs.columbia.edu and the rest of the semester Source code (e.g., *.java, *.c) (software) Compiler MIPS instruction set architecture
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data
More informationInstructor Information
CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus
More informationLecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)
Lecture Topics Today: Single-Cycle Processors (P&H 4.1-4.4) Next: continued 1 Announcements Milestone #3 (due 2/9) Milestone #4 (due 2/23) Exam #1 (Wednesday, 2/15) 2 1 Exam #1 Wednesday, 2/15 (3:00-4:20
More informationCS 101, Mock Computer Architecture
CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically
More informationCPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner
CPS104 Computer Organization and Programming Lecture 19: Pipelining Robert Wagner cps 104 Pipelining..1 RW Fall 2000 Lecture Overview A Pipelined Processor : Introduction to the concept of pipelined processor.
More informationAdvanced processor designs
Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationFull Datapath. Chapter 4 The Processor 2
Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationT = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good
CPU performance equation: T = I x CPI x C Both effective CPI and clock cycle C are heavily influenced by CPU design. For single-cycle CPU: CPI = 1 good Long cycle time bad On the other hand, for multi-cycle
More informationCourse web site: teaching/courses/car. Piazza discussion forum:
Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start
More informationBasic Computer Architecture
Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I
More informationComputer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John
Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationECE 154A Introduction to. Fall 2012
ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:
More informationCS146 Computer Architecture. Fall Midterm Exam
CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state
More informationCSE140: Components and Design Techniques for Digital Systems
CSE4: Components and Design Techniques for Digital Systems Tajana Simunic Rosing Announcements and Outline Check webct grades, make sure everything is there and is correct Pick up graded d homework at
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationAlternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model
What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:
More information