Instructor Information

Size: px
Start display at page:

Download "Instructor Information"

Transcription

1 Instructor Information 55:32/22C:60 High Performance Computer Architecture Spring 2008 Instructor: Jon Kuhl (That s me) Office: 406A SC Office Hours: 0:30-:30 a.m. MWF ( Other times by appointment) kuhl@engineering.uiowa.edu Phone: (39) TA: Prasidha Mohandas Office: 33 SC Office hours: t.b.d. Class Info. Website: Texts: Required: Shen and Lipasti, Modern Processor Design- -Fundamentals of Superscalar Processors, McGraw Hill, Supplemental: Thomas and Moorby, The Verilog Hardware Description Language,Third Edition, Kluwer Academic Publishers, 996. Additional Reference: Hennessy and Patterson, Computer Architecture A Quantitative Approach, Morgan Kaufmann, Fourth Edition, 2007 Course Objectives Understand quantitative measures for assessing and comparing processor performance Understand modern processor design techniques, including: pipelining instruction-level parallelism multi-threading high performance memory architecture Master the use of modern design tools (HDLs) to design and analyze processors Do case studies of contemporary processors Discuss future trends in processor design

2 Expected Background A previous course in computer architecture/organization covering: Instruction set architecture (ISA) Addressing modes Assembly language Basic computer organization Memory system organization Cache virtual Etc. 22c:060 or 55:035 or equivalent Course Organization Homework assignments--several Two projects (design/analysis exercises using the Verilog HDL and ModelSim simulation environment) Two exams: Midterm Wed. March 2, in class Final Tues. May 3, 2:5-4:5 p.m. Course Organization--continued Grading: Exams: Better of midterm/final exam score: 35% Poorer of midterm/final exam scores: 25% Homework: 0% Projects 30% Historical Perspectives The Decade of the 970 s: Birth of Microprocessors Programmable Controller Single-Chip Microprocessors Personal Computers (PC) The Decade of the 980 s: Quantitative Architecture Instruction Pipelining Fast Cache Memories Compiler Considerations Workstations The Decade of the 990 s: Instruction-Level Parallelism Superscalar,Speculative Microarchitectures Aggressive Compiler Optimizations Low-Cost Desktop Supercomputing 2

3 Moore s Law Moore s Law (965) The number of devices that can be integrated on a single piece of silicon will double roughly every 8-24 months Moore s law has held true for 40 years and will continue to hold for at least another decade. Intel Microprocessors Transistor Count Processor Performance DEC Alpha 2264/ Performance DEC Alpha 5/ DEC Alpha 5/ DEC Alpha 4/266 SUN-4/MIPS MIPS IBM 00 IBM POWER M/20 M2000 RS6000 DEC AXP/500 HP 9000/ Year 3

4 Evolution of Single-Chip Micros Transistor Count 0K- 00K 970 s 980 s 990 s K-M M-00M B Clock Frequency 0.2-2MHz 2-20MHz 20M- GHz Instruction/ Cycle 0GHz < (?) MIPS/MFLOPS < ,000 00,000 Performance Growth in Perspective Doubling every 24 months ( ): total of 260,000X Cars travel at 25 million MPH; get 5 million miles/gal. Air travel: L.A. to N.Y. in 0. seconds Corn yield: 50 million bushels per acre A Quote from Robert Cringely If the automobile had followed the same development as the computer, a Rolls- Royce would today cost $00, get a million miles per gallon, and explode once a year killing everyone inside. Convergence of Key Enabling Technologies: VLSI: Submicron CMOS feature sizes: Intel is shipping 45nm chips and has demonstrated 32nm (2x increase in density every 2 years) Metal layers: 3 -> 4 -> 5 -> 6 -> 9 (copper) Power supply voltage: 5v -> 3.3v -> 2.4v ->.8v ->0.8 v CAD Tools: Interconnect simulation and critical path analysis Clock signal propagation analysis Process simulation and yield analysis/learning Microarchitecture: Superpipelined and superscalar machines Speculative and dynamic microarchitectures Simulation tools and emulation systems Compilers: Extraction of instruction-level parallelism Aggressive and speculative code scheduling Object code translation and optimization 4

5 Instruction Set Processing ARCHITECTURE (ISA) programmer/compiler view Functional appearance (interface) to user/system programmer Opcodes, addressing modes, architected registers, IEEE floating point Serves as specification for processor design IMPLEMENTATION (µarchitecture) processor designer view Logical structure or organization that performs the architecture Pipelining, functional units, caches, physical registers REALIZATION (Chip) chip/system designer view Physical structure that embodies the implementation Gates, cells, transistors, wires Iron Law Time Processor Performance = Program Instructions Cycles = X X Program Instruction (code size) (CPI) Time Cycle (cycle time) Architecture --> Implementation --> Realization Compiler Designer Processor Designer Chip Designer Iron Law Instructions/Program Instructions executed, not static code size Determined by algorithm, compiler, ISA Cycles/Instruction Determined by ISA and CPU organization Overlap among instructions reduces this term Time/cycle Determined by technology, organization, clever circuit design Overall Goal Minimize time, which is the product, NOT isolated terms Common error to miss terms while devising optimizations E.g. ISA change to decrease instruction count BUT leads to CPU organization which makes clock slower Bottom line: terms are inter-related This is the crux of the RISC vs. CISC argument 5

6 Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is visible to the programmer Also, a functional spec for the processor designers What needs to be specified by an ISA Operations what to perform and what to perform next Temporary Operand Storage in the CPU accumulator, stacks, registers Number of operands per instruction Operand location where and how to specify the operands Type and size of operands Instruction-to-Binary Encoding Operand Storage Registers (in processor vs. memory)? faster access shorter address Accumulator less hardware high memory traffic likely bottleneck Operand Storage Stack - LIFO (60 s - 70 s) simple addressing (top of stack implicit) bottleneck while pipelining (why?) note: JAVA VM stack-based Registers - 8 to 256 words flexible: temporaries and variables registers must be named code density and second name space Registers Caches vs. Registers faster (no addressing modes, no tags) deterministic (no misses) can replicate for more ports short identifier must save/restore on procedure calls can t take address of a register (distinct from memory) fixed size (FP, strings, structures) compilers must manage (an advantage?) 6

7 Registers vs. Caches How many registers? more => hold operands longer (reducing memory traffic + run time) longer register specifiers (except with register windows) slower registers more state slows context switches Operands for ALU Instructions ALU instructions require operands Number of explicit operands two - r i := r i op r j three - r i := r j op r k operands in registers or memory any combo - VAX - variable length instrs at least one register - IBM 360/370 all registers - Cray, RISCs - separate loads/store instructions VAX Addressing Modes register: R i displacement M[R i + #n] immediate: #n register indirect M[R i ] indexed: M[R i + R j ] memory indirect: M[M[R i ]] auto-decrement: M[R i ]; R i -= d scaled: M[R i + #n + R j * d] update: M[R i = R i + #n] absolute: M[#n] auto-increment: M[R i ]; R i += d Modes -4 account for 93% of all VAX operands [Clark and Emer] Operations arithmetic and logical - and, add data transfer control system floating point decimal string - move, load, store - branch, jump, call - system call, traps - add, mul, div, sqrt - addd, convert - move, compare multimedia? 2D, 3D? e.g., Intel MMX/SSE and Sun VIS 7

8 Control Instructions (Branches). Types of Branches A. Conditional or Unconditional B. Save PC? C. How is target computed? Single target (immediate, PC+immediate) Multiple targets (register) 2. Branch Architectures A. Condition code or condition registers B. Register Save or Restore State What state? function calls: registers (CISC) system calls: registers, flags, PC, PSW, etc Hardware need not save registers caller can save registers in use callee save registers it will use Hardware register save IBM STM, VAX CALLS faster? Most recent architectures do no register saving Or do implicit register saving with register windows (SPARC) VAX DEC 977 VAX-/780 upward compatible from PDP- 32-bit words and addresses virtual memory 6 GPRs (r5 PC r4 SP), CCs extremely orthogonal and memory-memory decode as byte stream - variable in length opcode: operation, #operands, operand types Data types 8, 6, 32, 64, 28 char string - 8 bits/char decimal - 4 bits/digit numeric string - 8 bits/digit Addressing modes VAX literal 6 bits 8, 6, 32 bit immediates register, register deferred 8, 6, 32 bit displacements 8, 6, 32 bit displacements deferred indexed (scaled) autoincrement, autodecrement autoincrement deferred 8

9 operations VAX data transfer including string move arithmetic and logical (2 and 3 operands) control (branch, jump, etc) AOBLEQ function calls save state bit manipulation floating point - add, sub, mul, div, polyf system - exception, VM other - crc (cyclic redundancy check), insque (insert in Q) VAX addl3 R, 737\(R2\), #456 byte : addl3 byte 2: mode, R byte 3: mode, R2 byte 4,5: 737 byte 6: mode byte 7-0: 456 VAX has too many modes and formats Big deal with RISC is not fewer instructions few modes/formats => fast decoding to facilitate pipelining VAX /780 First implementation of VAX ISA 84% of instructions simple, 9% branches loop branches 9% taken, other branches 4% taken Operands: register mode 4%, complex addressing 6% Implementation ns => 0.5 MIPS 50% of time decoding, simple instructions only 0% of time memory stalls 2. CPI (<< 0.6) Anatomy of a Modern ISA Operations simple ALU op s, data movement, control transfer Temporary Operand Storage in the CPU Large General Purpose Register (GPR) File Number of operands per instruction triadic A B op C Operand location load-store architecture with register indirect addressing Type and size of operands 32/64-bit integers, IEEE floats Instruction-to-Binary Encoding Fixed width, regular fields Exceptions: Intel x86, IBM 390 (aka z900) 9

10 Dynamic-Static Interface Program (Software) Architecture Machine (Hardware) Compiler complexity Hardware complexity Exposed to software Hidden in hardware Semantic gap between s/w and h/w Static (DSI) Dynamic Placement of DSI determines how gap is bridged Dynamic-Static Interface DEL ~CISC ~VLIW ~RISC HLL Program DSI- DSI-2 DSI-3 Hardware Low-level DSI exposes more knowledge of hardware through the ISA Places greater burden on compiler/programmer Optimized code becomes specific to implementation In fact: happens for higher-level DSI also The Role of the Compiler Phases to manage complexity Parsing --> intermediate representation Procedure inlining Loop Optimizations Common Sub-Expression Jump Optimization Constant Propagation Register Allocation Strength Reduction Pipeline Scheduling Code Generation --> assembly code Performance and Cost Which computer is fastest? Not so simple Scientific simulation FP performance Program development Integer performance Commercial workload Memory, I/O 0

11 Performance of Computers Want to buy the fastest computer for what you want to do? Workload is all-important Correct measurement and analysis Want to design the fastest computer for what the customer wants to pay? Cost is always an important criterion Speed is not always the only performance criteria: Power Area Defining Performance What is important to whom? Computer system user Minimize elapsed time for program = time_end time_start Called response time Computer center manager Maximize completion rate = #jobs/second Called throughput Improve Performance Improve (a) response time or (b) throughput? Faster CPU Helps both (a) and (b) Add more CPUs Helps (b) and perhaps (a) due to less queuing Performance Comparison Machine A is n times faster than machine B iff perf(a)/perf(b) = time(b)/time(a) = n Machine A is x% faster than machine B iff perf(a)/perf(b) = time(b)/time(a) = + x/00 E.g. time(a) = 0s, time(b) = 5s 5/0 =.5 => A is.5 times faster than B 5/0 =.5 => A is 50% faster than B

12 Other Metrics MIPS and MFLOPS MIPS = instruction count/(execution time x 0 6 ) = clock rate/(cpi x 0 6 ) But MIPS has serious shortcomings Problems with MIPS E.g. without FP hardware, an FP op may take 50 single-cycle instructions With FP hardware, only one 2-cycle instruction Thus, adding FP hardware: CPI increases (why?) Instructions/program decreases (why?) Total execution time decreases BUT, MIPS gets worse! 50/50 => 2/ 50 => 50 => 2 50 MIPS => 2 MIPS Problems with MIPS Ignore program Usually used to quote peak performance Ideal conditions => guarantee not to exceed! When is MIPS ok? Same compiler, same ISA E.g. same binary running on Pentium-III, IV Why? Instr/program is constant and can be ignored Other Metrics MFLOPS = FP ops in program/(execution time x 0 6 ) Assuming FP ops independent of compiler and ISA Often safe for numeric codes: matrix size determines # of FP ops/program However, not always safe: Missing instructions (e.g. FP divide, sqrt/sin/cos) Optimizing compilers Relative MIPS and normalized MFLOPS Normalized to some common baseline machine E.g. VAX MIPS in the 980s 2

13 Iron Law Example Machine A: clock ns, CPI 2.0, for program x Machine B: clock 2ns, CPI.2, for program x Which is faster and how much? Time/Program = instr/program x cycles/instr x sec/cycle Time(A) = N x 2.0 x = 2N Time(B) = N x.2 x 2 = 2.4N Compare: Time(B)/Time(A) = 2.4N/2N =.2 So, Machine A is 20% faster than Machine B for this program Iron Law Example Keep ns and For equal performance, if CPI(B)=.2, what is CPI(A)? Time(B)/Time(A) = = (Nx2x.2)/(NxxCPI(A)) CPI(A) = 2.4 Iron Law Example Keep CPI(A)=2.0 and CPI(B)=.2 For equal performance, if clock(b)=2ns, what is clock(a)? Time(B)/Time(A) = = (N x 2.0 x clock(a))/(n x.2 x 2) clock(a) =.2ns Another Example OP Freq Cycles ALU 43% Load 2% Store 2% 2 Branch 24% 2 Assume stores can execute in cycle by slowing clock 5% Should this be implemented? 3

14 Example-- Let s do the math: OP Freq Cycles ALU 43% Load 2% Store 2% 2 Branch 24% 2 Old CPI = x x 2 =.36 New CPI = x 2 =.24 Speedup = old time/new time = {P x old CPI x T}/{P x new CPI x.5 T} = (.36)/(.24 x.5) = 0.95 Answer: Don t make the change Which Programs Execution time of what program? Best case you always run the same set of programs Port them and time the whole workload In reality, use benchmarks Programs chosen to measure performance Predict performance of actual workload Saves effort and money Representative? Honest? Benchmarketing Types of Benchmarks Real programs representative of real workload only accurate way to characterize performance requires considerable work Kernels or microbenchmarks representative program fragments good for focusing on individual features not big picture Instruction mixes instruction frequency of occurrence; calculate CPI Benchmarks: SPEC2000 System Performance Evaluation Cooperative Formed in 80s to combat benchmarketing SPEC89, SPEC92, SPEC95, now SPEC integer and 4 floating-point programs Sun Ultra-5 300MHz reference machine has score of 00 Report geometric mean of ratios to reference machine 4

15 Benchmarks: SPEC CINT2000 Benchmarks: SPEC CFP2000 Benchmark Description Benchmark 64.gzip 75.vpr 76.gcc 8.mcf 86.crafty 97.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf Description Compression FPGA place and route C compiler Combinatorial optimization Chess Word processing, grammatical analysis Visualization (ray tracing) PERL script execution Group theory interpreter Object-oriented database Compression Place and route simulator 68.wupwise 7.swim 72.mgrid 73.applu 77.mesa 78.galgel 79.art 83.equake 87.facerec 88.ammp 89.lucas 9.fma3d 200.sixtrack 30.apsi Physics/Quantum Chromodynamics Shallow water modeling Multi-grid solver: 3D potential field Parabolic/elliptic PDE 3-D graphics library Computational Fluid Dynamics Image Recognition/Neural Networks Seismic Wave Propagation Simulation Image processing: face recognition Computational chemistry Number theory/primality testing Finite-element Crash Simulation High energy nuclear physics accelerator design Meteorology: Pollutant distribution Benchmark Pitfalls Benchmark not representative Your workload is I/O bound, SPECint is useless Benchmark is too old Benchmarks age poorly; benchmarketing pressure causes vendors to optimize compiler/hardware/software to benchmarks Need to be periodically refreshed Benchmark Pitfalls Choosing benchmark from the wrong application space e.g., in a realtime environment, choosing gcc Choosing benchmarks from no application space e.g., synthetic workloads, esp. unvalidated ones Using toy benchmarks (dhrystone, whetstone) e.g., used to prove the value of RISC in early 80 s Mismatch of benchmark properties with scale of features studied e.g., using SPECINT for large cache studies 5

16 Benchmark Pitfalls Carelessly scaling benchmarks Truncating benchmarks Using only first few million instructions Reducing program data size Too many easy cases May not show value of a feature Too few easy cases May exaggerate importance of a feature Scalar to Superscalar Scalar processor Fetches and issues at most one instruction per machine cycle Superscalar processor-- Fetches and issues multiple instructions per machine cycle Can also define superscalar in terms of how many instructions can complete execution in a given machine cycle. Note that only a superscalar architecture can achieve a CPI of less than Processor Performance Time Processor Performance = Program Instructions Cycles = X X Program Instruction (code size) (CPI) In the 980 s (decade of pipelining): CPI: 5.0 =>.5 In the 990 s (decade of superscalar): CPI:.5 => 0.5 (best case) Time Cycle (cycle time) No. of Processors Amdahl s Law (Originally formulated for vector processing) N -f f Time f = fraction of program that is vectorizable (-f) = fraction that is serial N = speedup for vectorizable portion Overall speedup: Speedup = ( f ) + f N 6

17 Amdahl s Law--Continued Sequential bottleneck Even if N is infinite Performance limited by nonvectorizable portion (-f) lim = N f f + N f Ramifications of Amdahl s Law Consider: f = 0.9, (-f) = 0. For N, Speedup 0 Consider: f = 0.5, (-f) = 0.5 For N infinity, Speedup 2 Consider: f 0., (-f) = 0.9 For N infinity, Speedup. Maximum Achievable Speedup Pipelining time T 2 Inputs I, I 2, I 3, Unpipelined operation Outputs O, 0 2,.. 0 Time required to process K inputs = KT Speedup 8 6 Speedup Perfect Pipeline (N stages): T/N Stage T/N Stage 2 T/N Stage 3 T/N Stage N 4 I I I parallelizable fraction f I 3 I 2 I I N I N- I N-2 I O Time required to process K inputs = (K + N-)(T/N) Note For K >>N, the processing time approaches KT/N 7

18 Pipelined Performance Model Amdahl s Law Applied to Pipelining N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) No. of stages N -g g g = fraction of time the pipeline is full (-g) = fraction that it is not full N = pipeline depth Overall speedup: Speedup = ( g ) + g N Time Pipelined Performance Model Pipelined Performance Model N Pipeline Depth -g g g = fraction of time pipeline is filled -g = fraction of time pipeline is not filled (stalled) N Pipeline Depth -g g Tyranny of Amdahl s Law [Bob Colwell] When g is even slightly below 00%, a big performance hit will result Stalled cycles are the key adversary and must be minimized as much as possible 8

19 Superscalar Proposal Moderate tyranny of Amdahl s Law Ease sequential bottleneck More generally applicable Robust (less sensitive to f) Revised Amdahl s Law: s = amount of parallelism for nonvectorizable instructions Speedup = ( f ) f + s N Motivation for Superscalar [Agerwala and Cocke] Speedup p Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s= (scalar) n=6,s=2 Typical Range n=00 n=2 n=6 n= Vectorizability f Limits on Instruction Level Parallelism (ILP) Weiss and Smith [984].58 Sohi and Vajapeyam [987].8 Tjaden and Flynn [970] Tjaden and Flynn [973].96 Uht [986] 2.00 Smith et al. [989] 2.00 Jouppi and Wall [988] 2.40 Johnson [99] 2.50 Acosta et al. [986] 2.79 Wedig [982] 3.00 Butler et al. [99] 5.8 Melvin and Patt [99] 6 Wall [99] Kuck et al. [972] 8 Riseman and Foster [972] Nicolau and Fisher [984].86 (Flynn s bottleneck) 7 (Jouppi disagreed) 5 (no control dependences) 90 (Fisher s optimism) Superscalar Proposal Go beyond single instruction pipeline, achieve IPC > Dispatch multiple instructions per cycle Provide more generally applicable form of concurrency (not just vectors) Geared for sequential code that is hard to parallelize otherwise Exploit fine-grained or instruction-level parallelism (ILP) 9

20 Classifying ILP Machines [Jouppi, DECWRL 99] Baseline scalar RISC Issue parallelism = IP = Operation latency = OP = Peak IPC = Classifying ILP Machines [Jouppi, DECWRL 99] Superpipelined: cycle time = /m of baseline Issue parallelism = IP = inst / minor cycle Operation latency = OP = m minor cycles Peak IPC = m instr / major cycle (m x speedup?) SUCCESSIVE INSTRUCTIONS 0 IF DE EX WB TIME IN CYCLES (OF BASELINE MACHINE) IF DE EX WB Classifying ILP Machines [Jouppi, DECWRL 99] Superscalar: Issue parallelism = IP = n inst / cycle Operation latency = OP = cycle Peak IPC = n instr / cycle (n x speedup?) IF DE EX WB Classifying ILP Machines [Jouppi, DECWRL 99] VLIW: Very Long Instruction Word Issue parallelism = IP = n inst / cycle Operation latency = OP = cycle Peak IPC = n instr / cycle = VLIW / cycle IF DE EX WB 20

21 Classifying ILP Machines [Jouppi, DECWRL 99] Superpipelined-Superscalar Issue parallelism = IP = n inst / minor cycle Operation latency = OP = m minor cycles Peak IPC = n x m instr / major cycle IF DE EX WB Superscalar vs. Superpipelined Roughly equivalent performance If n = m then both have about the same IPC Parallelism exposed in space vs. time SUPERSCALAR SUPERPIPELINED Time in Cycles (of Base Machine) Key: IFetch Dcode Execute Writeback

Computer System. Performance

Computer System. Performance Computer System Performance Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Instructor Information

Instructor Information Instructor Information 55:32/22C:60 High Performance Computer Architecture Spring 2009 Instructor: Jon Kuhl (That s me) Office: 4322 SC Office Hours: 9:00-0:30 a.m. TTh ( Other times by appointment) E-mail:

More information

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti

ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti ECE/CS 552: Pipelining to Superscalar Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith Pipelining to Superscalar Forecast Real

More information

Pipelining to Superscalar

Pipelining to Superscalar Pipelining to Superscalar ECE/CS 752 Fall 207 Prof. Mikko H. Lipasti University of Wisconsin-Madison Pipelining to Superscalar Forecast Limits of pipelining The case for superscalar Instruction-level parallel

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Pipeline Processor Design

Pipeline Processor Design Pipeline Processor Design Beyond Pipeline Architecture Virendra Singh Computer Design and Test Lab. Indian Institute of Science Bangalore virendra@computer.org Advance Computer Architecture Branch Hazard

More information

1.6 Computer Performance

1.6 Computer Performance 1.6 Computer Performance Performance How do we measure performance? Define Metrics Benchmarking Choose programs to evaluate performance Performance summary Fallacies and Pitfalls How to avoid getting fooled

More information

Instructor Information

Instructor Information CS 203A Advanced Computer Architecture Lecture 1 1 Instructor Information Rajiv Gupta Office: Engg.II Room 408 E-mail: gupta@cs.ucr.edu Tel: (951) 827-2558 Office Times: T, Th 1-2 pm 2 1 Course Syllabus

More information

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics

EE382A Lecture 3: Superscalar and Out-of-order Processor Basics EE382A Lecture 3: Superscalar and Out-of-order Processor Basics Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 3-1 Announcements HW1 is due today Hand

More information

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras Performance, Cost and Amdahl s s Law Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Performance-

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 0 J A N L E M E I R E Course Objectives 2 Intel 4004 1971 2.3K trans. Intel Core 2 Duo 2006 291M trans. Where have all the transistors gone? Turing Machine

More information

CS/ECE/752 Chapter 2 Instruction Sets Instructor: Prof. Wood

CS/ECE/752 Chapter 2 Instruction Sets Instructor: Prof. Wood CS/ECE/752 Chapter 2 Instruction Sets Instructor: Prof. Wood Computer Sciences Department University of Wisconsin Slides developed by Profs. Falsafi,, Hill, Smith, Sohi,, and Vijaykumar of Carnegie Mellon

More information

Computer Science 246. Computer Architecture

Computer Science 246. Computer Architecture Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Performance Metrics Averaging Amdahl s Law Benchmarks The CPU Performance Equation Optimal

More information

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Lecture 2: Computer Performance Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533 Performance and Cost Purchasing perspective given a collection of machines, which has the - best performance?

More information

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL Beyond Pipelining Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Superscalar Organization

Superscalar Organization Superscalar Organization Nima Honarmand Instruction-Level Parallelism (ILP) Recall: Parallelism is the number of independent tasks available ILP is a measure of inter-dependencies between insns. Average

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Computer Performance Evaluation: Cycles Per Instruction (CPI) Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: where: Clock rate = 1 / clock cycle A computer machine

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Introduction To Computer Architecture

Introduction To Computer Architecture Introduction To Computer Architecture. Virendra Singh Computer Design and Test Lab. Supercomputer Education and Research Centre Indian Institute of Science Bangalore http://www.serc.iisc.ernet.in/~viren

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

The Von Neumann Computer Model

The Von Neumann Computer Model The Von Neumann Computer Model Partitioning of the computing engine into components: Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic

More information

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture

Lecture Topics. Principle #1: Exploit Parallelism ECE 486/586. Computer Architecture. Lecture # 5. Key Principles of Computer Architecture Lecture Topics ECE 486/586 Computer Architecture Lecture # 5 Spring 2015 Portland State University Quantitative Principles of Computer Design Fallacies and Pitfalls Instruction Set Principles Introduction

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

Topics/Assignments. Class 10: Big Picture. What s Coming Next? Perspectives. So Far Mostly Programmer Perspective. Where are We? Where are We Going?

Topics/Assignments. Class 10: Big Picture. What s Coming Next? Perspectives. So Far Mostly Programmer Perspective. Where are We? Where are We Going? Fall 2006 CS333: Computer Architecture University of Virginia Computer Science Michele Co Topics/Assignments Class 10: Big Picture Survey Homework 1 Read Compilers and Computer Architecture Principles/factors

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41 Performance II CS61C L41 Performance II (1) Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia UWB Ultra Wide Band! The FCC moved

More information

Instruction Set Architecture

Instruction Set Architecture Instruction Set Architecture Instructor: Preetam Ghosh Preetam.ghosh@usm.edu CSC 626/726 Preetam Ghosh Language HLL : High Level Language Program written by Programming language like C, C++, Java. Sentence

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

EE382 Processor Design. Class Objectives

EE382 Processor Design. Class Objectives EE382 Processor Design Stanford University Winter Quarter 1998-1999 Instructor: Michael Flynn Teaching Assistant: Steve Chou Administrative Assistant: Susan Gere Lecture 1 - Introduction Slide 1 Class

More information

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016 EECS2021 Computer Organization Fall 2015 The slides are based on the publisher slides and contribution from Profs Amir Asif and Peter Lian The slides will be modified, annotated, explained on the board,

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Logic Circuits for Register Transfer

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary. ECE C61 Computer Architecture Lecture 2 performance Prof Alok N Choudhary choudhar@ecenorthwesternedu 2-1 Today s s Lecture Performance Concepts Response Time Throughput Performance Evaluation Benchmarks

More information

Computer Architecture Computer Architecture. Computer Architecture. What is Computer Architecture? Grading

Computer Architecture Computer Architecture. Computer Architecture. What is Computer Architecture? Grading 178 322 Computer Architecture Lecturer: Watis Leelapatra Office: 4301D Email: watis@kku.ac.th Course Webpage: http://gear.kku.ac.th/~watis/courses/178322/178322.html Computer Architecture Grading Midterm

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD   17 April 2018 ARSITEKTUR SISTEM KOMPUTER Wayan Suparta, PhD https://wayansuparta.wordpress.com/ 17 April 2018 Reduced Instruction Set Computers (RISC) CISC Complex Instruction Set Computer RISC Reduced Instruction Set

More information

The Von Neumann Computer Model

The Von Neumann Computer Model The Von Neumann Computer Model Partitioning of the computing engine into components: Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic

More information

Outline. What Makes a Good ISA? Programmability. Implementability

Outline. What Makes a Good ISA? Programmability. Implementability Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

Quantifying Performance EEC 170 Fall 2005 Chapter 4

Quantifying Performance EEC 170 Fall 2005 Chapter 4 Quantifying Performance EEC 70 Fall 2005 Chapter 4 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture?

CS/ECE 752: Advanced Computer Architecture 1. Lecture 1: What is Computer Architecture? CS/ECE 752: Advanced Computer Architecture 1 Lecture 1: What is Computer Architecture? Karu Sankaralingam University of Wisconsin-Madison karu@cs.wisc.edu Slides courtesy of Stephen W. Keckler, UT-Austin

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 3: ISA Tradeoffs Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Design Point A set of design considerations and their importance

More information

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek. CS61C - Machine Structures Week 6 - Performance Oct 3, 2003 John Wawrzynek http://www-inst.eecs.berkeley.edu/~cs61c/ 1 Why do we worry about performance? As a consumer: An application might need a certain

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class Overview of Today s Lecture: Cost & Price, Performance EE176-SJSU Computer Architecture and Organization Lecture 2 Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class EE176

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

CO Computer Architecture and Programming Languages CAPL. Lecture 15

CO Computer Architecture and Programming Languages CAPL. Lecture 15 CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture

EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture EN2910A: Advanced Computer Architecture Topic 03: Superscalar core architecture Prof. Sherief Reda School of Engineering Brown University Material from: Mostly from Modern Processor Design by Shen and

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data

More information

ECE 486/586. Computer Architecture. Lecture # 8

ECE 486/586. Computer Architecture. Lecture # 8 ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

Parallel Computer Architecture

Parallel Computer Architecture Parallel Computer Architecture What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues: Resource Allocation:»

More information

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently? Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Computer Architecture

Computer Architecture 188 322 Computer Architecture Lecturer: Watis Leelapatra Office: 4301D Email: watis@kku.ac.th Course Webpage http://gear.kku.ac.th/~watis/courses/188322/188322.html 188 322 Computer Architecture Grading

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101 18 447 Lecture 21: Parallelism ILP to Multicores S 10 L21 1 James C. Hoe Dept of ECE, CMU April 7, 2010 Announcements: Handouts: Lab 4 due this week Optional reading assignments below. The Microarchitecture

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles. ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls Reference:

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 3: ISA Tradeoffs Dr. Ahmed Sallam Suez Canal University Based on original slides by Prof. Onur Mutlu Application Space Dream, and they will appear 2 Design Point A set of

More information

Processors. Young W. Lim. May 12, 2016

Processors. Young W. Lim. May 12, 2016 Processors Young W. Lim May 12, 2016 Copyright (c) 2016 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

More information

Instruction Set Design

Instruction Set Design Instruction Set Design software instruction set hardware CPE442 Lec 3 ISA.1 Instruction Set Architecture Programmer's View ADD SUBTRACT AND OR COMPARE... 01010 01110 10011 10001 11010... CPU Memory I/O

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

TRIPS: Extending the Range of Programmable Processors

TRIPS: Extending the Range of Programmable Processors TRIPS: Extending the Range of Programmable Processors Stephen W. Keckler Doug Burger and Chuck oore Computer Architecture and Technology Laboratory Department of Computer Sciences www.cs.utexas.edu/users/cart

More information