CS 152 Computer Architecture and Engineering Lecture 3 Metrics
|
|
- Heather Turner
- 6 years ago
- Views:
Transcription
1 CS 152 Computer Architecture and Engineering Lecture 3 Metrics John Lazzaro (not a prof - John is always OK) TA: Eric Love www-insteecsberkeleyedu/~cs152/ Play: CS 152 L3: Metrics UC Regents Spring 2014 UCB 1
2 Topics for today s lecture Metrics: Estimating the goodness of a CPU design so that we can redesign the CPU to be better Short Break A case study in microcode control: the Motorola 68000, the CPU that powered the original Macintosh [see Lecture 5 slides for this topic] Administrivia: Will announce office hours soon CS 152 L3: Metrics + Microcode UC Regents Spring 2014 UCB 2
3 On the drawing board Todd Hamilton, iwatch concept 3
4 Gray-scale computer graphics model Todd Hamilton, iwatch concept 4
5 Color computer graphics model Todd Hamilton, iwatch concept 5
6 Animated model Then the baton is passed to us Todd Hamilton, iwatch concept We use models to do stepwise refinement of the silicon that powers the consumer product 6
7 Four metrics: Performance Execution time of a program Energy Joules required to execute a program Today s Focus For a later lecture Cost How many dollars to manufacture Time to Market Will we ship a product before our competitors? For a later lecture For a later lecture 7
8 Performance Measurement (as seen by the customer) CS 152 L6: Performance UC Regents Fall 2006 UCB 8
9 Who (sensibly) upgrades CPUs often? A professional who turns CPU cycles into money, and who is cycle-limited Artist tool: animation, video special effects CS 152 L6: Performance UC Regents Fall 2006 UCB 9
10 How to decide to buy a new machine? Measure After Effects execution time on a representative render workload (still shot from the movie) Night flight City map and clouds computed on the fly with fractals CPU intensive Trivial I/O CS 152 L6: Performance UC Regents Fall 2006 UCB 10
11 Interpreting Execution Time Power Book G4 125 GHz Execution Time: 1265 seconds Performance 1 Execution Time = 285 renders/hour 15 GHz PB (Y) is N times faster than 125 GHz PB (X) N is? N = Performance (Y) Execution Time (X) Performance (X) = = 1 19 Execution Time (Y) PB 15 Ghz : 3 4 renders/hour PB 125 : 285 renders/hour Might make the difference in meeting a deadline CS 152 L6: Performance UC Regents Fall 2006 UCB 11
12 2 CPUs: Execution Time vs Throughput Execution Time: time for one job to complete 2 CPUs vs 1 CPU, otherwise similar 18x faster Implies parallel code on a Mac Throughput: # of independent jobs/hour completed Assume G5 MP execution time faster because AE isn t parallelized on Opteron CPUs However, G5 and Opteron may have same throughput CS 152 L6: Performance UC Regents Fall 2006 UCB 12
13 Performance Measurement (as seen by a CPU designer) Q Why do we care about After Effect s performance? A We want the CPU we are designing to run it well! CS 152 L6: Performance UC Regents Fall 2006 UCB 13
14 Step 1: Analyze the right measurement! Guides CPU design CPU Time: Time the CPU spends running program under measurement Measuring CPU time (Unix): % time <program name> 2577u 072s 0: % Guides system design Response Time: Total time: CPU Time + time spent waiting (for disk, I/O, ) CS 152 L6: Performance UC Regents Fall 2006 UCB 14
15 CPU time: Proportional to Instruction Count Q Once ISA is set, who can influence instruction count? A Compiler writer, application developer Q Static count? (lines of program printout) Or dynamic count? (trace of execution) A Dynamic CPU time Program Machine Instructions Program Rationale: Every additional instruction you execute takes time CS 152 L6: Performance Q How does a architect influence the number of machine instructions needed to run an algorithm? A Create new instructions: instruction set architect UC Regents Fall 2006 UCB 15
16 CPU time: Proportional to Clock Period Q How can architects (not technologists) reduce clock period? Q What ultimately limits an architect s ability to reduce clock period? We will revisit these questions later in lecture Time Program Time One Clock Period Rationale: We measure each instruction s execution time in number of cycles By shortening the period for each cycle, we shorten execution time CS 152 L6: Performance UC Regents Fall 2006 UCB 16
17 Completing the performance equation What factors make different programs have different CPIs? Cache behavior varies Instruction mix varies Branch prediction varies Seconds Program Instructions Program Cycles Instruction Seconds Cycle We need all three terms, and only these terms, to compute CPU Time! Q When is it OK to compare clock rates? A When other RHS terms are equal CS 152 L6: Performance CPI -- The Average Number of Clock Cycles Per Instruction For the Program UC Regents Fall 2006 UCB 17
18 Consider Lecture 2 single-cycle CPU All instructions take 1 cycle to execute every time they run CPI of any program running on machine? 10 average CPI for the program is a more-useful concept for more complicated machines CS L5: Pipelining UC Regents Fall 2008 UCB 18
19 Recall Lecture 2: Multi-flow VLIW CPU Q Which right-hand-side term decreases with N? Seconds Program Instructions Program A This one gets smaller Cycles Instruction Seconds Cycle A We hope this one doesn t grow Syntax: ADD $8 $9 $10 Semantics:$8 = $9 + $10 opcode rs rt rd shamt funct opcode rs rt rd shamt funct Syntax: ADD $7 $8 $9 Semantics:$7 = $8 + $9 N x -bit VLIW yields factor of N speedup! Multiflow: N = 7, 14, or 28 (3 CPUs in product family) CS 152 L3: Metrics + Microcode UC Regents Spring 2014 UCB 19
20 Consider machine with a data cache A program s load instructions stride through every memory address The cache never hits, so every load goes to DRAM (100x slower than loads that go to cache) Thus, the average number of cycles for load instructions is higher for this program Thus, the average number of cycles for all instructions is higher for this program Seconds Program Instructions Program Cycles Instruction Seconds Cycle Thus, program takes longer to run! CS 152 L6: Performance UC Regents Fall 2006 UCB 20
21 CPI as an analytical tool to guide design 5 Multiply 1 Other ALU Machine CPI (throughput, not latency) 2 Load 2 Store 2 Branch Program Instruction Mix Store 10% Branch 20% Load 20% Multiply 30% Other ALU 20% 5 x x x x x = 27 cycles/instruction Now we know how to optimize the design 20/270 7% Load 15% Branch 15% 7% Multiply 56% Where program spends its time CS 152 L6: Performance UC Regents Fall 2006 UCB 21
22 Final thoughts: Performance Equation Seconds Program Instructions Program Cycles Instruction Seconds Cycle Goal is to optimize execution time, not individual equation terms Machines are optimized with respect to program workloads The CPI of the program Reflects the program s instruction mix Clock period Optimize jointly with machine CPI CS 152 L6: Performance UC Regents Fall 2006 UCB 22
23 Invented the one ISA, many implementations business model CS 152 L6: Performance UC Regents Fall 2006 UCB 23
24 Amdahl s Law (of Diminishing Returns) Where program spends its time 8% Load 16% Branch 16% 8% Multiply 52% If enhancement E makes multiply infinitely fast, but other instructions are unchanged, what is the maximum speedup S? S = 1 (post-enhancement %) / 100% = 1 48%/100% = 208 Attributed to Gene Amdahl -- Amdahl s Law What is the lesson of Amdahl s Law? Must enhance computers in a balanced way! CS 152 L6: Performance UC Regents Fall 2006 UCB 24
25 Amdahl s Law in Action Program We Wish To Run On N CPUs Serial 30% Parallel 70% The program spends 30% of its time running code that can not be recoded to run in parallel S( ) S = (30 % + (70% / N) ) / 100 % # CPUs CPUs Speedup CS 152 L6: Performance UC Regents Fall 2006 UCB 25
26 Real-world 2006: 2 CPUs vs 4 CPUs 20 in imac Core Duo 2, 216 GHz $1500 Mac Pro 2 Dual-Core Xeons, 266 GHz $00 w/ 20 inch display CS 152 L6: Performance UC Regents Fall 2006 UCB 26
27 Real-world 2006: 2 CPUs vs 4 CPUs 2 cores on one die Amdahl s Law + Real-World Legacy Code Issues in action Source: MACWORLD 4 cores on two dies Caveat: Mac Pro CPUs are server-class and have architectural advantages (better I/O, ECC DRAM, ETC) CS 152 L6: Performance Simple audio and video tasks: easier to parallelize ZIPing a file: very difficult to parallelize UC Regents Fall 2006 UCB 27
28 Break CS 152 L3: Metrics Play: UC Regents Spring 2014 UCB 28
29 Timing CS 152 L3: Metrics UC Regents Spring 2014 UCB 29
30 CPU time: Proportional to Clock Period Q How can architects (not technologists) reduce clock period? Q What ultimately limits an architect s ability to reduce clock period? In this part of lecture: we answer these questions Time Program Time One Clock Period Rationale: We measure each instruction s execution time in number of cycles By shortening the period for each cycle, we shorten execution time CS 152 L6: Performance UC Regents Fall 2006 UCB 30
31 Goal: Determine minimum clock period + D PC Q Addr Instr Mem Data Equal Combinational Logic 0x4 PCSrc Clk E x t e n d op rs rt immediate 0 Control Lines RegDest RegFile rs1 rs2 rd1 ws rd2 wd WE Ext ALUctr op A L U Equal Data Memory Addr Dout Din WE RegWr ExtOp ALUsrc MemWr MemToReg CS L6: Timing UC Regents Fall 2008 UCB 31
32 A Logic Circuit Primer Models should be as simple as possible, but no simpler Albert Einstein CS 250 L3: Timing UC Regents Fall 2013 UCB
33 Inverters: A simple transistor model In CS 250 L3: Timing Inverter Out Out = In Correctly predicts logic output for simple static CMOS circuits In 0 1 Out Circuit In 1 0 Vdd PMOS Out NMOS Extensions to model subtler circuit families, or to predict timing, have not worked well pfet A switch On if gate is grounded nfet A switch On if gate is at Vdd UC Regents Fall 2013 UCB 33
34 Transistors as water valves (Cartoon physics) If electrons are water molecules, transistor strengths (W/L) are pipe diameters, and capacitors are buckets Vdd 1 A on p-fet fills up the capacitor with charge Open Charge 0 Water level Time A on n-fet empties the bucket CS 250 L3: Timing n Vdd Open Vdd Out Discharge 1 This model is often good enough 0 Water level Time UC Regents Fall 2013 UCB 34
35 What is the bucket? A gate s fan-out Inverter: NAND gate: Fan-out : The number of gate inputs driven by a gate s output Driving other gates slows a gate down Driving wires slows a gate down Driving it s own parasitics slows a gate down CS 250 L3: Timing UC Regents Fall 2013 UCB 35
36 Fanout CS 250 L3: Timing UC Regents Fall 2013 UCB 36
37 A closer look at fan-out Driving more gates adds delay Linear model works for reasonable fan-out 05ns Out: Low -> High Slope = 00021ns / ff FO4: Fanout of four delay CS 250 L3: Timing Delay time of an inverter driving 4 inverters Cout UC Regents Fall 2013 UCB 37
38 Propagation delay graphs Cascaded gates: 1 ->0 1 ->0 0 ->1 0 ->1 inverter transfer function Vout Vin CS 250 L3: Timing UC Regents Fall 2013 UCB 38
39 Worst-case delay through combinational logic T2 might be the worst-case delay path (critical path) 0 ->1 T2 0 ->1 T1 0 ->1 x = g(a, b, c, d, e, f) If d going 0-to-1 switches x 0-to-1, delay is T1 If a going 0-to-1 switches x 0-to-1, delay is T2 It would be surprising if T1 > T2 CS 250 L3: Timing UC Regents Fall 2013 UCB 39
40 1 v2 Why might? Wires have delay too Even in those cases where the transmission line effect is negligible: Wires posses distributed resistance and capacitance v1 v2 v3 v4 Wires posses distributed resistance and capacitance v1 v2 v3 v4 Wire Delay Time constant associated with distributed RC is proportional to the square of the length Time constant associated with distributed RC is proportional to the square of the length v3 v4 For short wires on ICs, v1 v2 v3 v4 resistance is insignificant (relative to effective R of transistors), but C is important Typically around half of C of gate load is in the wires For long wires on ICs: v1 v2 v3 v4 control signal, etc busses, clock lines, global Looks benign, but Resistance is significant, time therefore distributed RC effect dominates signals are typically rebuffered to reduce delay: time CS 250 L3: Timing UC Regents Fall 2013 UCB Spring 2003 EECS150 Lec10-Timing Page 16 40
41 Clocked Logic Circuits CS 250 L3: Timing UC Regents Fall 2013 UCB 41
42 From Delay Models to Timing Analysis clk Timing Analysis What is the smallest T that produces correct operation? f T 1 MHz 1 μs 10 MHz 100 ns 100 MHz 10 ns 1 GHz 1 ns CS 250 L3: Timing UC Regents Fall 2013 UCB 42
43 Timing Analysis and Logic Delay Register: An Array of Flip-Flops Combinational Logic If our clock period T > worst-case delay through CL, does this ensure correct operation? CS 250 L3: Timing UC Regents Fall 2013 UCB 43
44 Flip Flops have internal delays D Q Value of D is sampled on positive clock edge Q outputs sampled value for rest of cycle t_setup CLK D Q t_clk-to-q CS 250 L3: Timing UC Regents Fall 2013 UCB 44
45 Flip-Flop delays eat into time budget Combinational Logic ALU time budget T! # clk"q + # CL + # setup CS 250 L3: Timing UC Regents Fall 2013 UCB 45
46 Clock skew also eats into time budget CLKd CLK CLK CLK CLKd CLK CL As T 0, which circuit fails first? CL CLK CLK CLKd clock skew, delay in distribution T " T CL +T setup +T clk!q + worst case skew ost modern large high-performance chi CS 250 L3: Timing UC Regents Fall 2013 UCB 46
47 Clocks have dedicated wires (low skew) GCLK7 GCLK5 GCLK6 GCLK4 4 4 BUFGMUX 4 DCM 4 4 DCM Clock tree 4 Top Spine 8 Flip flop clock inputs are the leaves of the tree 8 Horizontal Spine Bottom Spine 8 8 DCM BUFGMUX 4 4 DCM From: Xilinx Spartan 3 data sheet Virtex is similar CS 152 L5: Timing GCLK2 GCLK0 GCLK3 GCLK1 UC Regents Fall 2006 UCB 47
48 Die photo: Xilinx Virtex Pro Gold wires form clock tree 48
49 Delay Grid Tuned sector trees Delay Sector buffers x CS 250 L3: Timing Clock Tree Delays, IBM Power CPU y Buffer level 2 Buffer level 1 UC Regents Fall 2013 UCB 49
50 15 10 Delay Volts (V) 20 ps skew Time (ps) Multiplefingered transmissio line x CS 250 L3: Timing Clock Tree Delays, IBM Power y UC Regents Fall 2013 UCB 50
51 Some Flip Flops have hold time t_setup t_inv t_hold CLK D Q D D must stay stable here What is the intended function of this circuit? CLK Does flip-flop hold time affect operation of this circuit? Under what conditions? CS 250 L3: Timing t_clk-to-q + t_inv > t_hold For correct operation UC Regents Fall 2013 UCB 51
52 Searching for processor critical path + D PC Q Addr Instr Mem Data Equal Combinational Logic 0x4 PCSrc Clk E x t e n d op rs rt immediate 0 Control Lines RegDest RegFile rs1 rs2 rd1 ws rd2 wd WE Ext ALUctr op A L U Equal Data Memory Addr Dout Din WE RegWr ExtOp ALUsrc MemWr MemToReg CS L6: Timing UC Regents Fall 2008 UCB 52
53 Searching for processor critical path Timing Analysis What is the smallest T that produces correct operation?? Must consider all connected register pairs Q Why might I suspect this one? A Very long wire on the path CS 250 L3: Timing UC Regents Fall 2013 UCB 53
54 Combinational paths for IBM Power 4 CPU The critical path Most wires have hundreds of picoseconds to spare Late-mode timing checks (thousands) Timing slack (ps) CS 250 L3: Timing From The circuit and physical design of the POWER4 microprocessor, IBM J Res and Dev, 46:1, Jan 2002, JD Warnock et al UC Regents Fall 2013 UCB 54
55 Power 4: Timing Estimation, Closure Timing Estimation Predicting a processor s clock rate early in the project From The circuit and physical design of the POWER4 microprocessor, IBM J Res and Dev, 46:1, Jan 2002, JD Warnock et al CS 250 L3: Timing UC Regents Fall 2013 UCB 55
56 Power 4: Timing Estimation, Closure Timing Closure Meeting (or exceeding!) the timing estimate From The circuit and physical design of the POWER4 microprocessor, IBM J Res and Dev, 46:1, Jan 2002, JD Warnock et al CS 250 L3: Timing UC Regents Fall 2013 UCB 56
57 Floorplaning: Essential to meet timing CS 250 L3: Timing (Intel XScale 80200) UC Regents Fall 2013 UCB 57
58 58
59 CPU time: Proportional to Clock Period Q How can architects (not technologists) reduce clock period? A Shorten the machine s critical path Time Program Q What ultimately limits an architect s ability to reduce clock period? A Clock-to-Q, setup times, 2-D floorplanning geometry Time One Clock Period Rationale: We measure each instruction s execution time in number of cycles By shortening the period for each cycle, we shorten execution time CS 152 L6: Performance UC Regents Fall 2006 UCB 59
60 On Thursday Pipeline design - with enough detail to do a design Have fun in section! 60
CS152 Computer Architecture and Engineering. Lecture 9 Performance Dave Patterson. John Lazzaro. www-inst.eecs.berkeley.
CS152 Computer Architecture and Engineering Lecture 9 Performance 2004-09-28 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Performance 2005-2-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Tips
More informationPerformance Measurement (as seen by the customer)
CS5 Computer Architecture and Engineering Last Time: Microcode, Multi-Cycle Lecture 9 Performance 004-09-8 Inputs sequencer control datapath control microinstruction (µ) µ-code ROM Dave Patterson (www.cs.berkeley.edu/~patterson)
More informationCS Digital Systems Project Laboratory. Lecture 6: Timing
CS 194-6 Digital Systems Project Laboratory Lecture 6: Timing 2008-10-20 John Lazzaro (wwwcsberkeleyedu/~lazzaro) + Skipped slides on Testing TA: Greg Gibeling www-insteecsberkeleyedu/~cs194-6/ 1 Today:
More informationEECS Digital Design
EECS 150 -- Digital Design Lecture 11-- Processor Pipelining 2010-2-23 John Wawrzynek Today s lecture by John Lazzaro www-inst.eecs.berkeley.edu/~cs150 1 Today: Pipelining How to apply the performance
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 4 Testing Processors 2005-1-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L10: Cache I UC
More informationCS 152 Computer Architecture and Engineering Lecture 4 Pipelining
CS 152 Computer rchitecture and Engineering Lecture 4 Pipelining 2014-1-30 John Lazzaro (not a prof - John is always OK) T: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 otorola 68000 Next week
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/ Office Hours
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 6 Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:
More informationCS 152 Computer Architecture and Engineering
CS 52 Computer Architecture and Engineering Lecture 6 -- Midterm I Review Session 204-3-3 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs52/ Play: CS 52 L6: Midterm
More informationPhysical Implementation
CS250 VLSI Systems Design Fall 2009 John Wawrzynek, Krste Asanovic, with John Lazzaro Physical Implementation Outline Standard cell back-end place and route tools make layout mostly automatic. However,
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I 2006-9-19 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ Last Time: ipod
More informationCS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design
CS 152 Computer Architecture and Engineering Lecture 1 Single Cycle Design 2014-1-21 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today s lecture
More informationLecture 7 Pipelining. Peng Liu.
Lecture 7 Pipelining Peng Liu liupeng@zju.edu.cn 1 Review: The Single Cycle Processor 2 Review: Given Datapath,RTL -> Control Instruction Inst Memory Adr Op Fun Rt
More informationCS 152 Computer Architecture and Engineering
CS 52 Computer Architecture and Engineering Lecture 26 Mid-Term II Review 26--3 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs52/ CS 52 L26: Mid-Term
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 13 Memory and Interfaces 2005-3-1 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014-3-6 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: 1 Today:
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/
More informationThe Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath
The Big Picture: Where are We Now? EEM 486: Computer Architecture Lecture 3 The Five Classic Components of a Computer Processor Input Control Memory Designing a Single Cycle path path Output Today s Topic:
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer rchitecture and Engineering Lecture 10 Pipelining III 2005-2-17 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Ts: Ted Hong and David arquardt www-inst.eecs.berkeley.edu/~cs152/ Last time:
More informationCS 61C: Great Ideas in Computer Architecture Control and Pipelining
CS 6C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.berkeley.edu/~cs6c/sp6 Datapath Control Signals ExtOp: zero, sign
More informationLecture 2: Performance
Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends
More informationCS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction
CS 61C: Great Ideas in Computer Architecture MIPS CPU Datapath, Control Introduction Instructor: Alan Christopher 7/28/214 Summer 214 -- Lecture #2 1 Review of Last Lecture Critical path constrains clock
More informationCS 250 VLSI Design Lecture 11 Design Verification
CS 250 VLSI Design Lecture 11 Design Verification 2012-9-27 John Wawrzynek Jonathan Bachrach Krste Asanović John Lazzaro TA: Rimas Avizienis www-inst.eecs.berkeley.edu/~cs250/ IBM Power 4 174 Million Transistors
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 39 Intra-machine Parallelism 2010-04-30!!!Head TA Scott Beamer!!!www.cs.berkeley.edu/~sbeamer Old-Fashioned Mud-Slinging with
More informationCourse web site: teaching/courses/car. Piazza discussion forum:
Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start
More informationGRE Architecture Session
GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble
More informationPerformance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals
Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of
More informationECE 486/586. Computer Architecture. Lecture # 2
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
More informationCO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19
CO2-3224 Computer Architecture and Programming Languages CAPL Lecture 8 & 9 Dr. Kinga Lipskoch Fall 27 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be
More informationAdvanced Computer Architecture (CS620)
Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).
More informationWorking on the Pipeline
Computer Science 6C Spring 27 Working on the Pipeline Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker CS61C L19 CPU Design : Designing a Single-Cycle CPU
More informationCS 110 Computer Architecture Single-Cycle CPU Datapath & Control
CS Computer Architecture Single-Cycle CPU Datapath & Control Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides
More informationCOMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath
COMP33 - Computer Architecture Lecture 8 Designing a Single Cycle Datapath The Big Picture The Five Classic Components of a Computer Processor Input Control Memory Datapath Output The Big Picture: The
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationLec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More informationCPU Organization (Design)
ISA Requirements CPU Organization (Design) Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions (e.g., Registers, ALU, Shifters, Logic
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs6c UC Berkeley CS6C : Machine Structures The Internet is broken?! The Clean Slate team at Stanford wants to revamp the Internet, making it safer (from viruses), more reliable
More information361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath
361 datapath.1 Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath Outline of Today s Lecture Introduction Where are we with respect to the BIG picture? Questions and Administrative
More informationFull Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI
CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4 Today s Contents We have looked
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationMidterm I October 6, 1999 CS152 Computer Architecture and Engineering
University of California, Berkeley College of Engineering Computer Science Division EECS Fall 1999 John Kubiatowicz Midterm I October 6, 1999 CS152 Computer Architecture and Engineering Your Name: SID
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 13 COMPUTER MEMORY So far, have viewed computer memory in a very simple way Two memory areas in our computer: The register file Small number
More informationCSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content
3/6/8 CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Today s Content We have looked at how to design a Data Path. 4.4, 4.5 We will design
More informationMajor CPU Design Steps
Datapath Major CPU Design Steps. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected
More informationLecture #17: CPU Design II Control
Lecture #7: CPU Design II Control 25-7-9 Anatomy: 5 components of any Computer Personal Computer Computer Processor Control ( brain ) This week ( ) path ( brawn ) (where programs, data live when running)
More informationThe Role of Performance
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture The Role of Performance What is performance? A set of metrics that allow us to compare two different hardware
More informationCPSC614: Computer Architecture
CPSC614: Computer Architecture E.J. Kim Texas A&M University Computer Science & Engineering Department Assignment 1, Due Thursday Feb/9 Spring 2017 1. A certain benchmark contains 195,700 floating-point
More informationUniversity of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Science
University of California, Berkeley College of Engineering Department of Electrical Engineering and Computer Science Spring 2000 Prof. Bob Brodersen Midterm 1 March 15, 2000 CS152: Computer Architecture
More informationCENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu
CENG 342 Computer Organization and Design Lecture 6: MIPS Processor - I Bei Yu CEG342 L6. Spring 26 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified
More informationComputer Architecture
Computer Architecture Architecture The art and science of designing and constructing buildings A style and method of design and construction Design, the way components fit together Computer Architecture
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationCS 250! VLSI System Design
CS 250! VLSI System Design Lecture 6 Design Verification 2014 09-16! Professor Jonathan Bachrach! slides by John Lazzaro TA: Colin Schmidt www-inst.eecs.berkeley.edu/~cs250/ CS 250 L6: Design Verification
More informationLecture 6 Datapath and Controller
Lecture 6 Datapath and Controller Peng Liu liupeng@zju.edu.cn Windows Editor and Word Processing UltraEdit, EditPlus Gvim Linux or Mac IOS Emacs vi or vim Word Processing(Windows, Linux, and Mac IOS) LaTex
More informationPipeline design. Mehran Rezaei
Pipeline design Mehran Rezaei How Can We Improve the Performance? Exec Time = IC * CPI * CCT Optimization IC CPI CCT Source Level * Compiler * * ISA * * Organization * * Technology * With Pipelining We
More informationMemory. Lecture 22 CS301
Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch
More informationComputer Architecture. Lecture 6.1: Fundamentals of
CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007
CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007 Name: Solutions (please print) 1-3. 11 points 4. 7 points 5. 7 points 6. 20 points 7. 30 points 8. 25 points Total (105 pts):
More informationCS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)
CS335B Computer Architecture Winter 25 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza www.csd.uwo.ca/courses/cs335b [Adapted from lectures on Computer Organization and Design,
More informationEECS 151/251A: SRPING 2017 MIDTERM 1
University of California College of Engineering Department of Electrical Engineering and Computer Sciences E. Alon Thursday, Mar 2 nd, 2017 7:00-8:30pm EECS 151/251A: SRPING 2017 MIDTERM 1 NAME Last First
More informationECE369. Chapter 5 ECE369
Chapter 5 1 State Elements Unclocked vs. Clocked Clocks used in synchronous logic Clocks are needed in sequential logic to decide when an element that contains state should be updated. State element 1
More informationReview: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.
Performance 980 98 982 983 984 985 986 987 988 989 990 99 992 993 994 995 996 997 998 999 2000 7/4/20 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Instructor: Michael Greenbaum
More informationThe Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture
The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture Introduction CPU performance factors Instruction count
More informationComputer Architecture = CS/ECE 552: Introduction to Computer Architecture. 552 In Context. Why Study Computer Architecture?
CS/ECE 552: Introduction to Computer Architecture Instructor: Mark D. Hill T.A.: Brandon Schwartz Section 2 Fall 2000 University of Wisconsin-Madison Lecture notes originally created by Mark D. Hill Updated
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control
ELEC 52/62 Computer Architecture and Design Spring 217 Lecture 4: Datapath and Control Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationCS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns
CS250 VLSI Systems Design Lecture 8: Introduction to Hardware Design Patterns John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 Lecture
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationCpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath
CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath CPE 442 single-cycle datapath.1 Outline of Today s Lecture Recap and Introduction Where are we with respect to the BIG picture?
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - John is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L12: Virtual
More informationCS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST
CS 110 Computer Architecture Pipelining Guest Lecture: Shu Yin http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C
More informationPerformance, Power, Die Yield. CS301 Prof Szajda
Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 15 Cache II 2005-3-8 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last Time: Locality
More informationMulticore and Parallel Processing
Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement
More informationCS430 Computer Architecture
CS430 Computer Architecture Spring 2015 Spring 2015 CS430 - Computer Architecture 1 Chapter 14 Processor Structure and Function Instruction Cycle from Chapter 3 Spring 2015 CS430 - Computer Architecture
More informationEE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100
EE 47 Midterm Summer 4 Redekopp Name: Closed Book / minutes No CALCULATORS Score: /. (7 pts.) Short Answer [Fill in the blanks or select the correct answer] a. If a control signal must be valid during
More informationEE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview
: Computer Architecture and Organization Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ : Computer Architecture and Organization -- Course Overview Goals»
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c Review! UC Berkeley CS61C : Machine Structures Lecture 28 Intra-machine Parallelism Parallelism is necessary for performance! It looks like itʼs It is the future of computing!
More informationPerformance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]
Performance CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Performance Complex question How fast is the processor? How fast your application runs?
More informationComputer Architecture I Midterm I (solutions)
Computer Architecture I Midterm II May 9 2017 Computer Architecture I Midterm I (solutions) Chinese Name: Pinyin Name: E-Mail... @shanghaitech.edu.cn: Question Points Score 1 1 2 23 3 13 4 18 5 14 6 15
More informationCSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade
CSE 141 Computer Architecture Summer Session 1 2004 Lecture 3 ALU Part 2 Single Cycle CPU Part 1 Pramod V. Argade Reading Assignment Announcements Chapter 5: The Processor: Datapath and Control, Sec. 5.3-5.4
More informationFundamentals of Computer Design
CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining
More informationUC Berkeley CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs6c UC Berkeley CS6C : Machine Structures Lecture 26 Single-cycle CPU Control 27-3-2 Exhausted TA Ben Sussman www.icanhascheezburger.com Qutrits Bring Quantum Computers Closer:
More informationQuiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [15 points] Consider two different implementations, M1 and
More informationSingle-Cycle Examples, Multi-Cycle Introduction
Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1 Guest Lecturer: Sagar Karandikar hfp://inst.eecs.berkeley.edu/~cs61c/ http://research.microsoft.com/apps/pubs/default.aspx?id=212001!
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore
More informationLecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533
Lecture 2: Computer Performance Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Performance and Cost Purchasing perspective given a collection of machines, which has the - best performance?
More informationCS152 Computer Architecture and Engineering Lecture 16: Memory System
CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson
More information