Instructor Information

Size: px
Start display at page:

Download "Instructor Information"

Transcription

1 Instructor Information 55:32/22C:60 High Performance Computer Architecture Spring 2009 Instructor: Jon Kuhl (That s me) Office: 4322 SC Office Hours: 9:00-0:30 a.m. TTh ( Other times by appointment) kuhl@engineering.uiowa.edu Phone: (39) TA: Shrae Rai Office: 33 SC Office hours: t.b.d. Class Info. Website: Texts: Required: Shen and Lipasti, Modern Processor Design- -Fundamentals of Superscalar Processors, McGraw Hill, Supplemental: Thomas and Moorby, The Verilog Hardware Description Language,Third Edition, Kluwer Academic Publishers, 996. Additional Reference: Hennessy and Patterson, Computer Architecture A Quantitative Approach, Morgan Kaufmann, Fourth Edition, 2007 Course Objectives Understand quantitative measures for assessing and comparing processor performance Understand modern processor design techniques, including: Pipelining high performance memory architecture instruction-level parallelism multi-threading Multi-core architecture Master the use of modern design tools (HDLs) to design and analyze processors Do case studies of contemporary processors Discuss future trends in processor design Jon Kuhl

2 Expected Background A previous course in computer architecture/organization covering: Instruction set architecture (ISA) Addressing modes Assembly language Basic computer organization Memory system organization Cache virtual Etc. 22C:060 or 55:035 or equivalent Course Organization Homework assignments--several Two projects (design/analysis exercises using the Verilog HDL and ModelSim simulation environment) Two exams: Midterm Th. March 2, in class Final Mon. May, noon-2:00 p.m. Course Organization--continued Grading: Exams: Better of midterm/final exam score: 35% Poorer of midterm/final exam scores: 25% Homework: 0% Projects 30% Historical Perspectives The Decade of the 970 s: Birth of Microprocessors Programmable Controller Single-Chip Microprocessors Personal Computers (PC) The Decade of the 980 s: Quantitative Architecture Instruction Pipelining Fast Cache Memories Compiler Considerations Workstations The Decade of the 990 s: Instruction-Level Parallelism Superscalar,Speculative Microarchitectures Aggressive Compiler Optimizations Low-Cost Desktop Supercomputing Jon Kuhl 2

3 Moore s Law Moore s Law (965) The number of devices that can be integrated on a single piece of silicon will double roughly every 8-24 months Moore s law has held true for 40 years and will continue to hold for at least another decade. Moore s Law Processor Performance Source: Wikipedia Jon Kuhl 3

4 Performance Growth in Perspective Doubling every 24 months ( ): total of 524,000X Cars travel at 40 million MPH; get 0 million miles/gal. Air travel: L.A. to N.Y. in 0.04 seconds Corn yield: 00 million bushels per acre A Quote from Robert Cringely If the automobile had followed the same development as the computer, a Rolls- Royce would today cost $00, get a million miles per gallon, and explode once a year killing everyone inside. Moore s Law--Processor Power Consumption Moore s Law Clock Frequency Jon Kuhl 4

5 Relationship between clock rate and power Intel Estimate : Increasing clock rate by 25% will yield approx. 5% performance increase But power consumption will be doubled Power Consumption/Heat Dissapation issues are ushering a new era in CPU design Focus on performance per watt Causing fundamental rethinking of archtecture Processor Design ARCHITECTURE (ISA) programmer/compiler view Functional appearance (interface) to user/system programmer Opcodes, addressing modes, architected registers, IEEE floating point Serves as specification for processor design IMPLEMENTATION (µarchitecture) processor designer view Logical structure or organization that performs the architecture Pipelining, functional units, caches, physical registers REALIZATION (Chip) chip/system designer view Physical structure that embodies the implementation Gates, cells, transistors, wires Phillip E. Ross, Why Clock Frequency Stalled, IEEE Spectrum, April, 2008 Iron Law Iron Law Considering Power Time Program = Time Program X Watts = Instructions Program (code size) X Cycles Instruction (CPI) X Time Cycle (cycle time) Instructions Program (code size) X Cycles Instruction (CPI) X Time Cycle (cycle time) X Watts Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer Jon Kuhl 5

6 Iron Law Instructions/Program Instructions executed, not static code size Determined by algorithm, compiler, ISA Cycles/Instruction Determined by ISA and CPU organization Overlap among instructions reduces this term Time/cycle Determined by technology, organization, clever circuit design Overall Goal Minimize time, which is the product, NOT isolated terms Common error to miss terms while devising optimizations E.g. ISA change to decrease instruction count BUT leads to CPU organization which makes clock slower Bottom line: terms are inter-related This is the crux of the RISC vs. CISC argument Iron Law Example Machine A: clock ns, CPI 2.0, for program P Machine B: clock 2ns, CPI.2, for program P Which is faster and how much? Time/Program = instr/program x cycles/instr x sec/cycle Time(A) = N x 2.0 x = 2N Time(B) = N x.2 x 2 = 2.4N Compare: Time(B)/Time(A) = 2.4N/2N =.2 So, Machine A is 20% faster than Machine B for this program Iron Law Example Keep ns and For equal performance, if CPI(B)=.2, what is CPI(A)? Time(B)/Time(A) = = (Nx2x.2)/(NxxCPI(A)) CPI(A) = 2.4 Jon Kuhl 6

7 Iron Law Example Keep CPI(A)=2.0 and CPI(B)=.2 For equal performance, if clock(b)=2ns, what is clock(a)? Time(B)/Time(A) = = (N x 2.0 x clock(a))/(n x.2 x 2) clock(a) =.2ns Another Example OP Freq Cycles ALU 43% Load 2% Store 2% 2 Branch 24% 2 Assume stores can execute in cycle by slowing clock 5% Should this be implemented? Example-- Let s do the math: OP Freq Cycles ALU 43% Load 2% Store 2% 2 Branch 24% 2 Old CPI = x x 2 =.36 New CPI = x 2 =.24 Speedup = old time/new time = {P x old CPI x T}/{P x new CPI x.5 T} = (.36)/(.24 x.5) = 0.95 Answer: Don t make the change Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is visible to the programmer Also, a functional spec for the processor designers What needs to be specified by an ISA Operations what to perform and what to perform next Temporary Operand Storage in the CPU accumulator, stacks, registers Number of operands per instruction Operand location where and how to specify the operands Type and size of operands Instruction-to-Binary Encoding Jon Kuhl 7

8 Basic ISA Classification Stack Architecture (zero operand): Operands popped from stack Result pushed on stack Accumulator (one operand): Special accumulator register is implicit operand Other operand from memory Register-Memory (two operand): One operand from register, other from memory Generally, one of the source operands is also the destination A few architectures e.g. VAX, M68000 have allowed mem. to mem. operations Register-Register or Load/Store (three operand): All operands for ALU instructions must be registers General format R d <= R s op R t Separate Load and Store instructions for memory access Other Important ISA Considerations Number of registers Addressing modes Data types/sizes Instruction functionality simple vs. complex Branch/jump/subroutine call functionality Exception handling Instruction format/size/regularity Etc. register: R i Addressing Modes displacement M[R i + #n] immediate: #n register indirect M[R i ] indexed: M[R i + R j ] memory indirect: M[M[R i ]] auto-decrement: M[R i ]; R i -= d scaled: M[R i + #n + R j * d] update: M[R i = R i + #n] absolute: M[#n] auto-increment: M[R i ]; R i += d Modes -4 account for 93% of all VAX operands [Clark and Emer] Operations arithmetic and logical - and, add data transfer control system floating point decimal string - move, load, store - branch, jump, call - system call, traps - add, mul, div, sqrt - addd, convert - move, compare multimedia? 2D, 3D? e.g., Intel MMX/SSE and Sun VIS Jon Kuhl 8

9 Control Instructions (Branches). Types of Branches A. Conditional or Unconditional B. Save PC? C. How is target computed? Single target (immediate, PC+immediate) Multiple targets (register) 2. Branch Architectures A. Condition code or condition registers B. Register Save or Restore State What state? function calls: registers (CISC) system calls: registers, flags, PC, PSW, etc Hardware need not save registers caller can save registers in use callee save registers it will use Hardware register save IBM STM, VAX CALLS faster? Most recent architectures do no register saving Or do implicit register saving with register windows (SPARC) Anatomy of a Modern ISA Operations simple ALU op s, data movement, control transfer Temporary Operand Storage in the CPU Large General Purpose Register (GPR) File Three operands per ALU instruction (all registers) A B op C Operand location load-store architecture with register indirect addressing Type and size of operands 32/64-bit integers, IEEE floats Instruction-to-Binary Encoding Fixed width, regular fields Exceptions: Intel x86, IBM 390 (aka z900) MIPS ISA The MIPS ISA was one of the first RISC instruction sets (985) Similar to ISAs of other RISC processors: Sun SPARC, HP PA-RISC,DEC Alpha Main characteristics Load-store architecture Three operand format (R d <= R s op R t ) 32 General Purpose Registers (actually 3) Only one addressing mode for memory operands: reg.indirect w. displ. Limited, highly orthogonal instruction set: 52 instructions Simple branch/jump/subroutine call architecture Jon Kuhl 9

10 MIPS Instruction Format Type -3- format (bits) -0- R opcode (6) rs (5) rt (5) rd (5) shamt (5) funct (6) I opcode (6) rs (5) rt (5) immediate (6) J opcode (6) address (26) Four instruction formats (branch format, not shown above, is similar to I format) All instructions are 32 bits with fixed op-code size A CISC ISA x86 (IA-32) This ISA was first introduced with the Intel 8086 processor in 978 Has evolved, with many additions over the years Main characteristics: Reg-mem architecture ALU instructions can have memory operands Two operand format one source operand is also destination Eight general purpose registers Seven addressing modes More than 500 instructions Instruction set is non-orthogonal Highly variable instruction size and format instruction size varies from to 7 bytes. X86 Instruction Format X86 Addressing Modes Absolute Register indirect Based Based indexed Based indexed with displacement Based with scaled index Based with scaled index and displacement Jon Kuhl 0

11 Dynamic-Static Interface Program (Software) Architecture Machine (Hardware) Compiler complexity Hardware complexity Exposed to software Hidden in hardware Semantic gap between s/w and h/w Static (DSI) Dynamic Placement of DSI determines how gap is bridged Dynamic-Static Interface DEL ~CISC ~VLIW ~RISC HLL Program DSI- DSI-2 DSI-3 Hardware Low-level DSI exposes more knowledge of hardware through the ISA Places greater burden on compiler/programmer Optimized code becomes specific to implementation In fact: happens for higher-level DSI also The Role of the Compiler Phases to manage complexity Parsing --> intermediate representation Procedure inlining Loop Optimizations Common Sub-Expression Jump Optimization Constant Propagation Register Allocation Strength Reduction Pipeline Scheduling Code Generation --> assembly code Performance and Cost Which computer is fastest? Not so simple Scientific simulation FP performance Program development Integer performance Commercial workload Memory, I/O Jon Kuhl

12 Performance of Computers Want to buy the fastest computer for what you want to do? Workload is all-important Correct measurement and analysis Want to design the fastest computer for what the customer wants to pay? Cost is always an important criterion Speed is not always the only performance criteria: Power Area Defining Performance What is important to whom? Computer system user Minimize elapsed time for program = time_end time_start Called response time Computer center manager Maximize completion rate = #jobs/second Called throughput Improve Performance Improve (a) response time or (b) throughput? Faster CPU Helps both (a) and (b) Add more CPUs Helps (b) and perhaps (a) due to less queuing Performance Comparison Machine A is n times faster than machine B iff perf(a)/perf(b) = time(b)/time(a) = n Machine A is x% faster than machine B iff perf(a)/perf(b) = time(b)/time(a) = + x/00 E.g. time(a) = 0s, time(b) = 5s 5/0 =.5 => A is.5 times faster than B 5/0 =.5 => A is 50% faster than B Jon Kuhl 2

13 Other Metrics MIPS and MFLOPS MIPS = instruction count/(execution time x 0 6 ) = clock rate/(cpi x 0 6 ) But MIPS has serious shortcomings Problems with MIPS E.g. without FP hardware, an FP op may take 50 single-cycle instructions With FP hardware, only one 2-cycle instruction Thus, adding FP hardware: CPI increases (why?) Instructions/program decreases (why?) Total execution time decreases BUT, MIPS gets worse! 50/50 => 2/ 50 => 50 => 2 50 MIPS => 2 MIPS Problems with MIPS Ignore program Usually used to quote peak performance Ideal conditions => guarantee not to exceed! When is MIPS ok? Same compiler, same ISA E.g. same binary running on Pentium-III, IV Why? Instr/program is constant and can be ignored Other Metrics MFLOPS = FP ops in program/(execution time x 0 6 ) Assuming FP ops independent of compiler and ISA Often safe for numeric codes: matrix size determines # of FP ops/program However, not always safe: Missing instructions (e.g. FP divide, sqrt/sin/cos) Optimizing compilers Relative MIPS and normalized MFLOPS Normalized to some common baseline machine E.g. VAX MIPS in the 980s Jon Kuhl 3

14 Which Programs Execution time of what program? Best case you always run the same set of programs Port them and time the whole workload In reality, use benchmarks Programs chosen to measure performance Predict performance of actual workload Saves effort and money Representative? Honest? Benchmarketing Types of Benchmarks Real programs representative of real workload only accurate way to characterize performance requires considerable work Kernels or microbenchmarks representative program fragments good for focusing on individual features not big picture Instruction mixes instruction frequency of occurrence; calculate CPI Benchmarks: SPEC2000 Benchmarks: SPEC CINT2000 System Performance Evaluation Cooperative Formed in 80s to combat benchmarketing SPEC89, SPEC92, SPEC95, now SPEC integer and 4 floating-point programs Sun Ultra-5 300MHz reference machine has score of 00 Report geometric mean of ratios to reference machine Benchmark 64.gzip 75.vpr 76.gcc 8.mcf 86.crafty 97.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf Description Compression FPGA place and route C compiler Combinatorial optimization Chess Word processing, grammatical analysis Visualization (ray tracing) PERL script execution Group theory interpreter Object-oriented database Compression Place and route simulator Jon Kuhl 4

15 Benchmarks: SPEC CFP2000 Benchmark Description 68.wupwise 7.swim 72.mgrid 73.applu 77.mesa 78.galgel 79.art 83.equake 87.facerec 88.ammp 89.lucas 9.fma3d 200.sixtrack 30.apsi Physics/Quantum Chromodynamics Shallow water modeling Multi-grid solver: 3D potential field Parabolic/elliptic PDE 3-D graphics library Computational Fluid Dynamics Image Recognition/Neural Networks Seismic Wave Propagation Simulation Image processing: face recognition Computational chemistry Number theory/primality testing Finite-element Crash Simulation High energy nuclear physics accelerator design Meteorology: Pollutant distribution Benchmark Pitfalls Benchmark not representative Your workload is I/O bound, SPECint is useless Benchmark is too old Benchmarks age poorly; benchmarketing pressure causes vendors to optimize compiler/hardware/software to benchmarks Need to be periodically refreshed Benchmark Pitfalls Choosing benchmark from the wrong application space e.g., in a realtime environment, choosing gcc Choosing benchmarks from no application space e.g., synthetic workloads, esp. unvalidated ones Using toy benchmarks (dhrystone, whetstone) e.g., used to prove the value of RISC in early 80 s Mismatch of benchmark properties with scale of features studied e.g., using SPECINT for large cache studies Benchmark Pitfalls Carelessly scaling benchmarks Truncating benchmarks Using only first few million instructions Reducing program data size Too many easy cases May not show value of a feature Too few easy cases May exaggerate importance of a feature Jon Kuhl 5

16 Scalar to Superscalar Scalar processor Fetches and issues at most one instruction per machine cycle Superscalar processor-- Fetches and issues multiple instructions per machine cycle Can also define superscalar in terms of how many instructions can complete execution in a given machine cycle. Note that only a superscalar architecture can achieve a CPI of less than Processor Performance Time Processor Performance = Program Instructions Cycles = X X Program Instruction (code size) (CPI) In the 980 s (decade of pipelining): CPI: 5.0 =>.5 In the 990 s (decade of superscalar): CPI:.5 => 0.5 (best case) Time Cycle (cycle time) No. of Processors Amdahl s Law (Originally formulated for vector processing) N -f f Time f = fraction of program that is vectorizable (-f) = fraction that is serial N = speedup for vectorizable portion Overall speedup: Speedup = ( f ) + f N Amdahl s Law--Continued Sequential bottleneck Even if N is infinite Performance limited by nonvectorizable portion (-f) lim = N f f + N f Jon Kuhl 6

17 Ramifications of Amdahl s Law Maximum Achievable Speedup Consider: f = 0.9, (-f) = 0. For N, Speedup 0 Consider: f = 0.5, (-f) = 0.5 For N infinity, Speedup 2 Consider: f 0., (-f) = 0.9 For N infinity, Speedup. Speedup Speedup parallelizable fraction f Inputs I, I 2, I 3, Pipelining time T Unpipelined operation Time required to process K inputs = KT Perfect Pipeline (N stages): T/N Stage T/N Stage 2 T/N Stage 3 T/N Stage N Outputs O, 0 2,.. I I 2 I I 3 I 2 I I N I N- I N-2 I O Time required to process K inputs = (K + N-)(T/N) Note For K >>N, the processing time approaches KT/N Pipelined Performance Model N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) Jon Kuhl 7

18 Amdahl s Law Applied to Pipelining No. of stages N -g g g = fraction of time the pipeline is full (-g) = fraction that it is not full N = pipeline depth Overall speedup: Speedup = ( g ) + g N Time Pipelined Performance Model N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) Pipelined Performance Model N Pipeline Depth -g g Tyranny of Amdahl s Law [Bob Colwell] When g is even slightly below 00%, a big performance hit will result Stalled cycles are the key adversary and must be minimized as much as possible Superscalar Proposal Moderate tyranny of Amdahl s Law Ease sequential bottleneck More generally applicable Robust (less sensitive to f) Revised Amdahl s Law: s = amount of parallelism for nonvectorizable instructions Speedup = ( f ) f + s N Jon Kuhl 8

19 Motivation for Superscalar [Agerwala and Cocke] Speedup p Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s= (scalar) n=6,s=2 Typical Range n=00 n=2 n=6 n= Vectorizability f Limits on Instruction Level Parallelism (ILP) Weiss and Smith [984].58 Sohi and Vajapeyam [987].8 Tjaden and Flynn [970] Tjaden and Flynn [973].96 Uht [986] 2.00 Smith et al. [989] 2.00 Jouppi and Wall [988] 2.40 Johnson [99] 2.50 Acosta et al. [986] 2.79 Wedig [982] 3.00 Butler et al. [99] 5.8 Melvin and Patt [99] 6 Wall [99] Kuck et al. [972] 8 Riseman and Foster [972] Nicolau and Fisher [984].86 (Flynn s bottleneck) 7 (Jouppi disagreed) 5 (no control dependences) 90 (Fisher s optimism) Superscalar Proposal Go beyond single instruction pipeline, achieve IPC > Dispatch multiple instructions per cycle Provide more generally applicable form of concurrency (not just vectors) Geared for sequential code that is hard to parallelize otherwise Exploit fine-grained or instruction-level parallelism (ILP) Classifying ILP Machines [Jouppi, DECWRL 99] Baseline scalar RISC Issue parallelism = IP = Operation latency = OP = Peak IPC = SUCCESSIVE INSTRUCTIONS IF DE EX WB TIME IN CYCLES (OF BASELINE MACHINE) Jon Kuhl 9

20 Classifying ILP Machines [Jouppi, DECWRL 99] Superpipelined: cycle time = /m of baseline Issue parallelism = IP = inst / minor cycle Operation latency = OP = m minor cycles Peak IPC = m instr / major cycle (m x speedup?) IF DE EX WB Classifying ILP Machines [Jouppi, DECWRL 99] Superscalar: Issue parallelism = IP = n inst / cycle Operation latency = OP = cycle Peak IPC = n instr / cycle (n x speedup?) IF DE EX WB Classifying ILP Machines [Jouppi, DECWRL 99] VLIW: Very Long Instruction Word Issue parallelism = IP = n inst / cycle Operation latency = OP = cycle Peak IPC = n instr / cycle = VLIW / cycle Classifying ILP Machines [Jouppi, DECWRL 99] Superpipelined-Superscalar Issue parallelism = IP = n inst / minor cycle Operation latency = OP = m minor cycles Peak IPC = n x m instr / major cycle IF DE EX WB IF DE EX WB Jon Kuhl 20

21 Superscalar vs. Superpipelined Roughly equivalent performance If n = m then both have about the same IPC Parallelism exposed in space vs. time SUPERSCALAR SUPERPIPELINED Key: IFetch Dcode Execute Writeback Time in Cycles (of Base Machine) Jon Kuhl 2

Computer System. Performance

Computer System. Performance Computer System Performance Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Instructor Information

Instructor Information Instructor Information 55:32/22C:60 High Performance Computer Architecture Spring 2008 Instructor: Jon Kuhl (That s me) Office: 406A SC Office Hours: 0:30-:30 a.m. MWF ( Other times by appointment) E-mail:

More information

Pipelining to Superscalar

Pipelining to Superscalar Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel

More information

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Pipeline Processor Design

Pipeline Processor Design Pipeline Processor Design Beyond Pipeline Architecture Virendra Singh Computer Design and Test Lab. Indian Institute of Science Bangalore virendra@computer.org Advance Computer Architecture Branch Hazard

More information

1.6 Computer Performance

1.6 Computer Performance 1.6 Computer Performance Performance How do we measure performance? Define Metrics Benchmarking Choose programs to evaluate performance Performance summary Fallacies and Pitfalls How to avoid getting fooled

More information

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics EE382A Lecture 3: Superscalar and Out-of-order Processor Basics Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 3-1 Announcements HW1 is due today Hand

More information

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras Performance, Cost and Amdahl s s Law Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Performance-

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL Beyond Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Computer Science 246. Computer Architecture

Computer Science 246. Computer Architecture Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Performance Metrics Averaging Amdahl s Law Benchmarks The CPU Performance Equation Optimal

More information

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Lecture 2: Computer Performance Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Performance and Cost Purchasing perspective given a collection of machines, which has the - best performance?

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Superscalar Organization

Superscalar Organization Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average

More information

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Computer Performance Evaluation: Cycles Per Instruction (CPI) Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: where: Clock rate = 1 / clock cycle A computer machine

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =

More information

Introduction To Computer Architecture

Introduction To Computer Architecture Introduction To Computer Architecture. Virendra Singh Computer Design and Test Lab. Supercomputer Education and Research Centre Indian Institute of Science Bangalore http://www.serc.iisc.ernet.in/~viren

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture Lecture Topics ECE 486/586 Computer Architecture Lecture # 5 Spring 2015 Portland State University Quantitative Principles of Computer Design Fallacies and Pitfalls Instruction Set Principles Introduction

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41 Performance II CS61C L41 Performance II (1) Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia UWB Ultra Wide Band! The FCC moved

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 0 J A N L E M E I R E Course Objectives 2 Intel 4004 1971 2.3K trans. Intel Core 2 Duo 2006 291M trans. Where have all the transistors gone? Turing Machine

More information

Outline. What Makes a Good ISA? Programmability. Implementability

Outline. What Makes a Good ISA? Programmability. Implementability Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary. ECE C61 Computer Architecture Lecture 2 performance Prof Alok N Choudhary choudhar@ecenorthwesternedu 2-1 Today s s Lecture Performance Concepts Response Time Throughput Performance Evaluation Benchmarks

More information

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016 EECS2021 Computer Organization Fall 2015 The slides are based on the publisher slides and contribution from Profs Amir Asif and Peter Lian The slides will be modified, annotated, explained on the board,

More information

The Von Neumann Computer Model

The Von Neumann Computer Model The Von Neumann Computer Model Partitioning of the computing engine into components: Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Quantifying Performance EEC 170 Fall 2005 Chapter 4

Quantifying Performance EEC 170 Fall 2005 Chapter 4 Quantifying Performance EEC 70 Fall 2005 Chapter 4 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

ECE 486/586. Computer Architecture. Lecture # 8

ECE 486/586. Computer Architecture. Lecture # 8 ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

The Von Neumann Computer Model

The Von Neumann Computer Model The Von Neumann Computer Model Partitioning of the computing engine into components: Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class Overview of Today s Lecture: Cost & Price, Performance EE176-SJSU Computer Architecture and Organization Lecture 2 Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class EE176

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles. ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls Reference:

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

EE382 Processor Design. Class Objectives

EE382 Processor Design. Class Objectives EE382 Processor Design Stanford University Winter Quarter 1998-1999 Instructor: Michael Flynn Teaching Assistant: Steve Chou Administrative Assistant: Susan Gere Lecture 1 - Introduction Slide 1 Class

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture?

CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture? CS/ECE 752: Advanced Computer Architecture 1 Lecture 1: What is Computer Architecture? Karu Sankaralingam University of Wisconsin-Madison karu@cs.wisc.edu Slides courtesy of Stephen W. Keckler, UT-Austin

More information

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently? Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

EC-801 Advanced Computer Architecture

EC-801 Advanced Computer Architecture EC-801 Advanced Computer Architecture Lecture 5 Instruction Set Architecture I Dr Hashim Ali Fall 2018 Department of Computer Science and Engineering HITEC University Taxila!1 Instruction Set Architecture

More information

Computer Architecture

Computer Architecture Computer Architecture Mehran Rezaei m.rezaei@eng.ui.ac.ir Welcome Office Hours: TBA Office: Eng-Building, Last Floor, Room 344 Tel: 0313 793 4533 Course Web Site: eng.ui.ac.ir/~m.rezaei/architecture/index.html

More information

Instruction Set Architecture

Instruction Set Architecture Instruction Set Architecture Instructor: Preetam Ghosh Preetam.ghosh@usm.edu CSC 626/726 Preetam Ghosh Language HLL : High Level Language Program written by Programming language like C, C++, Java. Sentence

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

CSE 141 Computer Architecture Spring Lecture 3 Instruction Set Architecute. Course Schedule. Announcements

CSE 141 Computer Architecture Spring Lecture 3 Instruction Set Architecute. Course Schedule. Announcements CSE141: Introduction to Computer Architecture CSE 141 Computer Architecture Spring 2005 Lecture 3 Instruction Set Architecute Pramod V. Argade April 4, 2005 Instructor: TAs: Pramod V. Argade (p2argade@cs.ucsd.edu)

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD   17 April 2018 ARSITEKTUR SISTEM KOMPUTER Wayan Suparta, PhD https://wayansuparta.wordpress.com/ 17 April 2018 Reduced Instruction Set Computers (RISC) CISC Complex Instruction Set Computer RISC Reduced Instruction Set

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Instruction Set Design

Instruction Set Design Instruction Set Design software instruction set hardware CPE442 Lec 3 ISA.1 Instruction Set Architecture Programmer's View ADD SUBTRACT AND OR COMPARE... 01010 01110 10011 10001 11010... CPU Memory I/O

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture Chapter 2 Note: The slides being presented represent a mix. Some are created by Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many are taken from the Patterson & Hennessy book, Computer

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568/668 Part 1 Introduction Israel Koren ECE568/Koren Part.1.1 Coping with ECE 568/668 Students with varied

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 3: ISA Tradeoffs Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Design Point A set of design considerations and their importance

More information

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm

Computer Performance. Reread Chapter Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm Computer Performance He said, to speed things up we need to squeeze the clock Reread Chapter 1.4-1.9 Quiz on Friday. Study Session Wed Night FB 009, 5pm-6:30pm L15 Computer Performance 1 Why Study Performance?

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned

More information

EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture

EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture Prof. Sherief Reda School of Engineering Brown University Material from: Mostly from Modern Processor Design by Shen and

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization Review I Prof. Michel A. Kinsy Computing: The Art of Abstraction Application Algorithm Programming Language Operating System/Virtual Machine Instruction Set Architecture (ISA)

More information

Designing for Performance. Patrick Happ Raul Feitosa

Designing for Performance. Patrick Happ Raul Feitosa Designing for Performance Patrick Happ Raul Feitosa Objective In this section we examine the most common approach to assessing processor and computer system performance W. Stallings Designing for Performance

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 15

CO Computer Architecture and Programming Languages CAPL. Lecture 15 CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro II Lect 10 Feb 15, 2008 Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L10.1

More information

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek. CS61C - Machine Structures Week 6 - Performance Oct 3, 2003 John Wawrzynek http://www-inst.eecs.berkeley.edu/~cs61c/ 1 Why do we worry about performance? As a consumer: An application might need a certain

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Logic Circuits for Register Transfer

More information