COSC 243. Computer Architecture 2. Lecture 12 Computer Architecture 2. COSC 243 (Computer Architecture)

Size: px
Start display at page:

Download "COSC 243. Computer Architecture 2. Lecture 12 Computer Architecture 2. COSC 243 (Computer Architecture)"

Transcription

1 COSC 243 1

2 Overview This Lecture Architectural topics CISC RISC Multi-core processors Source: Chapter 15,16 and 17(10 th edition) 2

3 Moore s Law Gordon E. Moore Co-founder of Intel April 1965 The complexity for minimum component costs has increased at a rate of roughly a factor of two per year 1975 Altered his projection to a doubling every two years Often misquoted as double in speed each 18 months! 3

4 Moore s Law 61 core Xeon Ph 22 core Xeon Pentium Pentium III Pentium II Pentium Pro Pentium Core 2 Duo

5 CISC What is the best thing to do with all those transistors? Add extra instructions? Make the CPU do more (integrated cache, etc)? Pipelines? We call these Complex Instruction Set Computers 5

6 High Level Languages As the cost of a computer dropped the relative cost of software went up As computers became more ubiquitous the need to port software from one machine to another increased As the complexity of software went up the need to use high level languages increased Programs today are almost always written in high level languages As time went on languages became higher You could do more in the same number of lines of code 6

7 The Semantic Gap A semantic gap appeared. Programming languages are disconnected from CPU architecture This is part of the purpose of high level languages New instructions were added to the CPU, but: They were not being used by programmers Who wrote in high level languages They were not being used by the compilers It wasn t worthwhile re-writing the compiler for each release of a CPU The new instructions were being ignored We need a CPU optimized for high level language use 7

8 Research In running C programs: Most machine code instructions are call statements Most memory references are due to call statements Perhaps we should optimize those instructions? Operation Run-time Use Instructions Memory Use assignment 38% 13% 15% loop 3% 32% 26% call 12% 33% 45% if 43% 21% 12% goto 3% - - other 1% 1% 1% 8

9 More Research In running C programs: Most references are to local variables Type Use Constants 23% Variables 53% Arrays / Structures 24% 9

10 Even More Research 98% of procedures have fewer than 6 parameters 92% of procedures use fewer than 6 local variables Lets optimize the CPU to make the slow parts faster Lets use all those transistors to make faster and simpler CPUs 10

11 RISC Reduced Instruction Set Computers Three design principles: Large number of registers This reduces the number of memory accesses Careful design of the pipeline for conditional branches Better handling of if statements and procedure calls Simplified instruction set Each instruction does less Fewer addressing modes Often just as many instructions as in a CISC CPU Reduced complexity does not mean reduced number of instructions 11

12 RISC Characteristics: One instruction per cycle This keeps the pipeline simple Register to register operations the norm Simple addressing modes (often only 1) Simple instruction formats Fixed instruction length Aligned on machine word boundaries (for fast CPU load) 12

13 RISC Register Windows A large number of registers addressed relative to a start The out registers of one routine are the in registers of the next These are the parameters Each routine has local registers Upon entry to a routine the start is moved to the next block Some arbitrary depth of nesting (often 8 or over) Only necessary to write to memory after this depth of nested calls Some arbitrary number of registers in each windows (8+) Only necessary to write to memory if more than this number is being used A.In A.Local A.Out B.In B.Local B.Out C.In C.Local C.Out D.In D.Local D.Out 13

14 ARM 32-bit Registers Extra copies of SP and LR in each mode FIRQ has its own set of R8-R12 User System (privileged user) Supervisor (OS) Abort Undefined IRQ FIRQ R0_usr R1_usr R2_usr R3_usr R4_usr R5_usr R6_usr R7_usr R8_usr R8_firq R9_usr R9_firq R10_usr R10_firq R11_usr R11_firq R12_usr R12_firq R13 (SP_usr) SP_svc SP_abt SP_und SP_irq SP_firq R14 (LR_usr) LR_svc LR_abt LR_und LR_irq LR_firq R15 (PC) CSPR spsr_svc spsr_abt spsr_und spsr_irq spsr_firq COSC 243 (Computer Architecture) Lecture 10 - Computer Architecture 2 14

15 ARM Link Register LR, is used in procedure calls 6502: JSR ; Push the return address (minus 1) onto the stack RTS; Pull return address from the stack (and add 1) ARM: Either: Or BL ; Copy PC to LR and branch BX LR ; Branch to where LR points BL ; Copy PC to LR and branch PUSH {LR}; push LR onto the stack POP {PC}; pull LR off the stack and store in PC COSC 243 (Computer Architecture) Lecture 10 - Computer Architecture 2 15

16 CISC vs. RISC RISC requires more program instructions than CISC RISC instructions are simplified Have fewer addressing modes Take less memory space to store CISC does more per instruction But the control unit is more complex (and so slower) The microcode is more complex (and so slower) The microcode is often a RISC program! 16

17 CISC vs. RISC CISC: Minimize instructions per program Increase cycles per instruction RISC: Minimise cycles per instruction Increase instructions per program time program = time cycle cycles instruction instructions program 17

18 Superpipelines If the RISC (or micro-coded CISC) instructions are so simple then it isn t necessary to use an entire clock cycle to perform each stage of the pipeline So we can double the CPU s internal clock speed and do each of the simple operations in half the time This doubles CPU throughput By halving the time to complete each instruction 18

19 Superscalar Why use only one pipeline, lets have two. Then we can execute two instructions at once! This is called instruction-level parallelism. Five limitations to instruction-level parallelism True data dependency (write after read, WAR) Output dependency (write after write, WAW) Antidependency (read after write, RAW) Procedural dependency Conditional branches require a pipeline reload Resource conflicts Both pipelines require access to memory at the same time 19

20 Superscalar However, if there are no dependencies then the instructions need not be executed in the correct order so long as the result is the same This is known as out-of-order execution In-order issue with in-order completion Instructions must start and finish in the correct order In-order issue with out-of-order completion The CPU starts the instructions in order but the second one finished before the first one! Out-of-order issue with out-of-order completion The CPU does the next instruction before the current one! E.g. (TSX then TYA) does the order matter? 20

21 Superscalar A program is a linear sequence of instructions Instruction fetch with branch prediction Produces an instruction stream Stream examined for dependencies Instructions are re-ordered by their dependencies Instructions are executed based on dependencies on each other and the hardware resources Results are recorded or discarded Discarded in the case of speculative prediction 21

22 Superscalar Window of Execution Static Program Instruction Fetch and Branch Prediction Produces an Instruction Stream Instruction Execution Instruction Re-order And Commit Instruction Dispatch Instruction Issue 22

23 Hyperthreading (SMT) We can do more!!!! The CPU slows down when we access memory The pipeline slows down when we have dependencies Can we write programs that do more than one thing at a time but whose parts don t interact (much)? Yes! We can use threading Actually, the OS switches between programs too Perhaps we can build that into the CPU too 23

24 Hyperthreading (SMT) Imagine a superscalar architecture with 2 pipelines Each pipeline reads from a different part of memory Each pipeline has a separate set of registers If one pipeline becomes stalled the other keeps going Two programs are executed at the same time! This is called Simultaneous Multithreading (SMT) This is the approach of the Intel Hyperthreading CPUs Such as the Pentium 4 and later 24

25 Heat! The heat dissipation in a transistor is linear in the switching rate. The faster you switch the more heat you get The total amount of heat generated is linear in the number of transistors on the silicon die Both have been following Moore s law! That s a polynomial increase in power over time Intel i7 Extreme is 130W A bright light bulb is 100W 25

26 Multi-Core How can we reduce the heat? The obvious solution is to go slower How can we go slower and faster at the same time? Instead of having one CPU on the silicon die we put two. They share certain resources including: The buses The level 2 cache 26

27 Vectors What if you want to do the same operation to each element of an array? One way is to use array (wide) registers and use instructions to perform an operation on each element Instruction decoding only occurs once e.g. PADDD xmm0, xmm1 [3] [2] [1] [0] xmm xmm xmm We call this Single Instruction Multiple Data (SIMD) Intel: SSE (128-bit) AVX (256-bit) AVX512 (512-bit) Arrays of 32/64-bit float or 8/16/32/64-bit integers 27

28 Classification of Architectures Single instruction, single data (SISD) Normal computer Single instruction, multiple data (SIMD) Intel SSE instructions / etc. Graphics processors Multiple instruction, multiple data (MIMD) Multi-core 28

29 Coprocessors Graphics co-processors Sound co-processors Math co-processors Physics co-processor Network co-processors Often work by catching illegal instructions and interpreting them for the CPU. 29

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function William Stallings Computer Organization and Architecture 8 th Edition Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data

More information

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle.

A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. CS 320 Ch. 16 SuperScalar Machines A superscalar machine is one in which multiple instruction streams allow completion of more than one instruction per cycle. A superpipelined machine is one in which a

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 4

ECE 571 Advanced Microprocessor-Based Design Lecture 4 ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Budditha Hettige Department of Statistics and Computer Science University of Sri Jayewardenepura Microprocessors 2011 Budditha Hettige 2 Processor Instructions

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization

TECH. CH14 Instruction Level Parallelism and Superscalar Processors. What is Superscalar? Why Superscalar? General Superscalar Organization CH14 Instruction Level Parallelism and Superscalar Processors Decode and issue more and one instruction at a time Executing more than one instruction at a time More than one Execution Unit What is Superscalar?

More information

Processor Performance and Parallelism Y. K. Malaiya

Processor Performance and Parallelism Y. K. Malaiya Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by a program to execute is the product of n Number of machine instructions executed n Number of clock cycles

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

ZYNQ/ARM Interrupt Controller

ZYNQ/ARM Interrupt Controller ZYNQ/ARM Interrupt Controller Events, Exceptions (faults), and Interrupts Events are expected or unexpected signals that indicate a new situation may need immediate attention. Events include: Fault conditions,

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Programmazione Avanzata

Programmazione Avanzata Programmazione Avanzata Vittorio Ruggiero (v.ruggiero@cineca.it) Roma, Marzo 2017 Pipeline Outline CPU: internal parallelism? CPU are entirely parallel pipelining superscalar execution units SIMD MMX,

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

EE 4980 Modern Electronic Systems. Processor Advanced

EE 4980 Modern Electronic Systems. Processor Advanced EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors

Chapter 06: Instruction Pipelining and Parallel Processing. Lesson 14: Example of the Pipelined CISC and RISC Processors Chapter 06: Instruction Pipelining and Parallel Processing Lesson 14: Example of the Pipelined CISC and RISC Processors 1 Objective To understand pipelines and parallel pipelines in CISC and RISC Processors

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Lecture 26: Parallel Processing. Spring 2018 Jason Tang

Lecture 26: Parallel Processing. Spring 2018 Jason Tang Lecture 26: Parallel Processing Spring 2018 Jason Tang 1 Topics Static multiple issue pipelines Dynamic multiple issue pipelines Hardware multithreading 2 Taxonomy of Parallel Architectures Flynn categories:

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012 CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations

More information

Embedded RISC Microcontroller Core ARM7TDMI

Embedded RISC Microcontroller Core ARM7TDMI Features 32-bit RISC Architecture Two Instruction Sets: ARM High-performance 32-bit Instruction Set Thumb High-code-density 16-bit Instruction Set Very Low Power Consumption: Industry-leader in MIPS/Watt

More information

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution In-order vs. Out-of-order Execution In-order instruction execution instructions are fetched, executed & committed in compilergenerated order if one instruction stalls, all instructions behind it stall

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 11 Superscalar Pendidikan Teknik Elektronika FT UNY What is Superscalar? Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed

More information

omputer Design Concept adao Nakamura

omputer Design Concept adao Nakamura omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Instruction Set And Architectural Features Of A Modern Risc Processor

Instruction Set And Architectural Features Of A Modern Risc Processor Instruction Set And Architectural Features Of A Modern Risc Processor PowerPC, as an evolving instruction set, has since 2006 been named Power 1 History, 2 Design features The result was the POWER instruction

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale Datapoint 2200 IA-32 Nicholas FitzRoy-Dale At the forefront of the computer revolution - Intel Difficult to explain and impossible to love - Hennessy and Patterson! Released 1970! 2K shift register main

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Processor Structure and Function

Processor Structure and Function WEEK 4 + Chapter 14 Processor Structure and Function + Processor Organization Processor Requirements: Fetch instruction The processor reads an instruction from memory (register, cache, main memory) Interpret

More information

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines Announcement Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Seung-Jong Park (Jay) http://wwwcsclsuedu/~sjpark 1 2 Chapter 9 Objectives 91 Introduction Learn the properties that often distinguish

More information

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets

administrivia final hour exam next Wednesday covers assembly language like hw and worksheets administrivia final hour exam next Wednesday covers assembly language like hw and worksheets today last worksheet start looking at more details on hardware not covered on ANY exam probably won t finish

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 10 Prof. Patrick Crowley Plan for Today Questions Dynamic Execution III discussion Multiple Issue Static multiple issue (+ examples) Dynamic multiple issue

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Performance of computer systems

Performance of computer systems Performance of computer systems Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

More information

CISC, RISC and post-risc architectures

CISC, RISC and post-risc architectures Microprocessor architecture and instruction execution CISC, RISC and Post-RISC architectures Instruction Set Architecture and microarchitecture Instruction encoding and machine instructions Pipelined and

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Microelectronics. Moore s Law. Initially, only a few gates or memory cells could be reliably manufactured and packaged together.

Microelectronics. Moore s Law. Initially, only a few gates or memory cells could be reliably manufactured and packaged together. Microelectronics Initially, only a few gates or memory cells could be reliably manufactured and packaged together. These early integrated circuits are referred to as small-scale integration (SSI). As time

More information

ECE 2162 Intro & Trends. Jun Yang Fall 2009

ECE 2162 Intro & Trends. Jun Yang Fall 2009 ECE 2162 Intro & Trends Jun Yang Fall 2009 Prerequisites CoE/ECE 0142: Computer Organization; or CoE/CS 1541: Introduction to Computer Architecture I will assume you have detailed knowledge of Pipelining

More information

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points] Review Questions 1 The DRAM problem [5 points] Suggest a solution 2 Big versus Little Endian Addressing [5 points] Consider the 32-bit hexadecimal number 0x21d3ea7d. 1. What is the binary representation

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

CISC RISC. Compiler. Compiler. Processor. Processor

CISC RISC. Compiler. Compiler. Processor. Processor Q1. Explain briefly the RISC design philosophy. Answer: RISC is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at a high clock speed. The RISC

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: MIPS Instruction Set Architecture vonneumann Architecture Modern computers use the vonneumann architecture. Idea:

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator).

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator). Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.

More information

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved. + William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 14 Processor Structure and Function + Processor Organization

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-11: 80x86 Architecture 1 The 80x86 architecture processors popular since its application in IBM PC (personal computer). 2 First Four generations

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki

Communications and Computer Engineering II: Lecturer : Tsuyoshi Isshiki Communications and Computer Engineering II: Microprocessor 2: Processor Micro-Architecture Lecturer : Tsuyoshi Isshiki Dept. Communications and Computer Engineering, Tokyo Institute of Technology isshiki@ict.e.titech.ac.jp

More information

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , ) Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections 3.4-3.5, 3.8-3.14) 1 1-Bit Prediction For each branch, keep track of what happened last time and use

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

ECE 587 Advanced Computer Architecture I

ECE 587 Advanced Computer Architecture I ECE 587 Advanced Computer Architecture I Instructor: Alaa Alameldeen alaa@ece.pdx.edu Spring 2015 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2015 1 When and Where? When:

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

ARM ARCHITECTURE. Contents at a glance:

ARM ARCHITECTURE. Contents at a glance: UNIT-III ARM ARCHITECTURE Contents at a glance: RISC Design Philosophy ARM Design Philosophy Registers Current Program Status Register(CPSR) Instruction Pipeline Interrupts and Vector Table Architecture

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 12 Mahadevan Gomathisankaran March 4, 2010 03/04/2010 Lecture 12 CSCE 4610/5610 1 Discussion: Assignment 2 03/04/2010 Lecture 12 CSCE 4610/5610 2 Increasing Fetch

More information

Computer & Microprocessor Architecture HCA103

Computer & Microprocessor Architecture HCA103 Computer & Microprocessor Architecture HCA103 Computer Evolution and Performance UTM-RHH Slide Set 2 1 ENIAC - Background Electronic Numerical Integrator And Computer Eckert and Mauchly University of Pennsylvania

More information

Parallel Systems I The GPU architecture. Jan Lemeire

Parallel Systems I The GPU architecture. Jan Lemeire Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution

More information

ECE 486/586. Computer Architecture. Lecture # 8

ECE 486/586. Computer Architecture. Lecture # 8 ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls

More information

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance William Stallings Computer Organization and Architecture 8 th Edition Chapter 2 p Computer Evolution and Performance ENIAC - background Electronic Numerical Integrator And Computer Eckert and Mauchly University

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Pentium IV-XEON. Computer architectures M

Pentium IV-XEON. Computer architectures M Pentium IV-XEON Computer architectures M 1 Pentium IV block scheme 4 32 bytes parallel Four access ports to the EU 2 Pentium IV block scheme Address Generation Unit BTB Branch Target Buffer I-TLB Instruction

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information