Professor Lee, Yong Surk. References 고성능마이크로프로세서구조의개요. Topics Microprocessor & microcontroller

Size: px
Start display at page:

Download "Professor Lee, Yong Surk. References 고성능마이크로프로세서구조의개요. Topics Microprocessor & microcontroller"

Transcription

1 이강좌는 C & S Technology 사의지원으로제작되었으며 copyright가없으므로비영리적인목적에한하여누구든지복사, 배포가가능합니다. 연구실홈페이지에는고성능마이크로프로세서에관련된많은강좌가있으며누구나무료로다운로드받을수있습니다. Professor Lee, Yong Surk 1973 : B.S., Electrical Eng., Yonsei niv : Ph.D,, niv. of ichigan, Ann Arbor 1982 ~ 1992 : Designe microprocessors in silicon valley, California Designe Pentium at Intel (1989 ~ 1992) 1993 ~ : Professor at Yonsei niversity 1 2 고성능마이크로프로세서구조의개요 (High Performance icroprocessor Architecture Overview) 연세대학교전기전자공학과 이용석교수 Homepage: yonglee@yonsei.ac.kr 전화 : References [1] J.L.Hennessy & D.A.Patterson, Computer Architecture, a Quantitative Approach, Secon Eition, organ Kaufmann Publishers, 1996 [2] N.Alexanriis, Design of icroprocessor Base Systems, Prentice Hall, [3] D. Sima,, T. Fountain, P. Kacsuk, Avance Computer Architectures, a Design Space Approach, Aison - esley, 1997 [4] IEEE Stanar Committee, IEEE Stanar for Binary Floating-Point Arithmetic, ANSI / IEEE St [5] icroprocessor Report,.PRONLINE.CO 5 Topics icroprocessor & microcontroller & C Superscalar & VLI Pipelining Branch strategy Cache memory / Floating point unit ing Top-own esign 6

2 µ P Classification Performance Low eium High 4, 8 bits 16, 32 bits 32, 64 bits transistors Below a few tens of 1000s Above a few million Zilog Z-80, Intel 8051 AR, Hitachi SH Pentium, ltrasparc, Alpha 7 µp P vs. µc C (1) Data with freq Architecture Cache,, floating point unit, pipelining Design methoology icro - processor 32, 64 bits 4, 8, 16, 32 bits Above a few hunre Hz (Harvar) se Top - own icro- controller Below a few hunre Hz (Von Neumann) Not use Top own Bottom -up 8 µp P vs. µc C (2) ultiplier Divier icro - processor Booth multiplier SRT ivier icro - controller ses AL ses AL (Complex Set Computer) Transistor Application Above a few million Computer CP Below a few million Controller (Reuce Set Computer) Cost High Low 9 10 vs. (1) (Ref.[1, 2]) instruction length execution time Pipelining (Pentium) A few hunres Variable ( bytes) Variable (1-300 clk/ inst) Not efficient A few tens Fixe (44 bytes) Fixe (1 clk/ inst) Efficient 11 vs. (2) Control registers Data memory access Design effort (Pentium) icrocoe Small ( about 8) emory operans 5-10 X 600(man man- year) Harwire Large ( ) Loa, store only 1 X 60(man man- year) 12

3 em = Temp 1 Temp 2 em R 1 +em em R 1 +Temp 1 Temp 2 R 2 R 3 em em R 1 + R 2 R3 microinstruction = instruction 13 Intel Pentium4 () Prefix 0 4 Op coe o R / SIB (bytes) Ar isp Imm ata ~17 byte length 14 Alpha () icroprocessor Spee (per task) AL 31 0 Opcoe Ra Rb SBZ 0 Function Rc = (NI( ) X (CPI( ) X C Branch Opcoe Ra emor_isp NI : execute s Loa, Store Floating point Opcoe Opcoe Ra Fa Rb Fb emor_isp Function Fc 15 CPI : s Per C : perio (=1/ clock frequency) 16 NI ( execute Inst ) CPI ( Per Inst) C (1/ frequency) Spee 1X 1X - 300X 1X - 2X Slow 2X 1X 1X Fast 17 Benchmark IPS (illion Inst Per Secon) VAX IPS - VAX 11/780 Dhrystone, hetstone, Linpack - benchmark engineering SPEC int - 89, 92, 95, 00, 04 SPEC SPEC fp - 89, 92, 95, 00, 04 (.SPEC.ORG) 18

4 Alpha 21264C IPS R14000 Sun ltra- Sparc-3 Intel Pentium-4 Five Stage Pipeline (Ref.[1, 2]) freq. transistor Issue rate SPEC00b (int/ fp) Power consumption 1000Hz 500Hz 1000Hz 15.4m 7.2m 29m 2000Hz 42m / / / / icroprocessor Report, December, 2001 (.PRONLINE.CO) F : Inst fetch (inst access) & inc D : Inst ecoe & regrea E : AL op or ar calculation : Data access : Reg writeback 20 Alpha Pipeline F D Arr Iss E1 E2 Scalar µp F : Fetch D : Decoe Arr : Arrange to issue Issue : Issue & reg rea E 1 : AL operation E 2 : Data access : rite back 21 I #1 I #2 I #3 I #4 I #5 1 inst per clock (I) 22 Superscalar µp I #1 I #2 I #3 I #4 I #5 I #6 2 inst per clock (I) 23 Superscalar Data epenency R 1 R 2 + R 3 R 5 R 1 + R 4 Compiler minimizes ata epenency Dynamic scheuling by harware ost superscalar µps can issue 3 6 instructions maximum per cycle Compatibility 24

5 VLI (Very Long or) µp Loa Store A Compare FP a FP mult Branch em #1 em #2 Int AL Int AL FP AL FP mult ultiporte register file Branch unit 25 VLI (Ref. [3], p176) Static scheuling by compiler Complex compiler, simple harware No compatibility Cannot program in assembly language 26 Scalar µp Havar Architecture I #1 I #2 I #3 I #4 I #5 F D E emory access 27 CP Inst bus Data bus Inst memory Data memory 28 Von Neumann Architecture Sub R 3, R 1, R 2 ; R 3 R 1 - R 2 CP Inst bus Data bus nifie memory F : Inst fetch & inc D : Inst ecoe & R 1, R 2 rea E : Temp R 1 - R 2, status available Two - port memory : No op : : R 3 temp ; write status reg 29 30

6 Loa R 3, R 1, R 2 ; R 3 [R 1 + R 2 ] Store R 1, R 2, R 3 ; [R 1 + R 2 ] R 3 F : Inst fetch & inc F : Inst fetch & inc D : Inst ecoe & R 1, R 2 rea D : Inst ecoe & R 1, R 2, R 3 rea E : R 1 + R 2 ; ar calculation E : R 1 + R 2 ; ar calculation : [R 1 + R 2 ] ; ata rea : [R 1 + R 2 ] R3 : R 3 [R 1 + R 2 ] 31 : No op ; ata write 32 Br 85 ; Branch to + 85 F : Inst fetch & inc + 4 X FF FF FF FF A BR ADDR D : Inst ecoe & + 85 calculation E : + 85 ; branch : No op Inst D e c o e R E G X A L Data X : No op Ege Triggere S Flip-Flop Flop Inst Cache iss Cache miss penalty D Q D FF Q I #1 I #2 I #3 I #4 I #5 Inst miss 35 36

7 Loa Interlock Frozen R 1 [R 2 +85] Bypass In 0 X 1 D Q Flip flop Out Avance R 4 R 1 + R 3 XXX Freeze 37 0 : Freeze 1 : Avance 38 In D Q Flip flop Out Data Bypassing (Forwaring) AND Freeze Freeze 1 cycle frozen R 1 R 2 +R 3 R 5 R 1 -R 4 Bypass FF FF FF FF X Inst D e c o e A R E G BR ADDR X A L Bypass Data X 41 Branch Strategy (Ref.[1]) BTB (Branch Target Buffer) Delaye branch Preict - not - taken 42

8 Delaye Branch Branch Delay slot Target fetch F D E F D E Target aress F D E Compiler has to fill the elay slot 43 Preict - not - taken ntaken branch i + 1 i + 2 Taken branch F i + 1 Target aress Branch target fetch Flushe! XXX XXX XXX XXX 44 µp P System Fast CP Reg file SRA Cache mem DRA ain mem Slow H.D. 8 reg 512KB 256B 30GB icroprocessor Frequency (Perio) 10Hz(100nS) 100Hz(10nS) DRA Access Time 100nS 50nS Block size (16 64B) Page size (2K 8KB) Hz(1nS) 25nS 46 Aress Translation 2 32 = 4G 4KB 4KB 0 Virtual (logical) aress emory anagement nit KB 4KB Physical aress 47 IEEE Floating Point Stanar (Ref. [4]) Single precision Double precision bits S E F bits S E F υ = (-1)( S. e 2 (1.F), e = E - bias 48

9 0 Integer < < Single precision Double precision < ing 1 GHz clock (= 1nS perio) 30cm 2cm 2cm µp P chip FF 1 FF 2 1nS t 3cm metal ( t=0.1ns) Top-Down Design Steps 1. Simulator (C) 20% 2. HDL moel 20% 3. Verification 20% 4. Synthesis, full custom 20% 5. Verification 20% Total 100% 6. Fault graing aitional 20% 51

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Funamentals of Computer Systems an Engineering Fall 017 Datapaths Prof. John Boar Duke University Slies are erive from work by Profs. Tyler Bletch an Anrew Hilton (Duke) an Amir Roth (Penn) What

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17

Caches and Memory. Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.10, 5.13, 5.15, 5.17 Caches and emory Anne Bracy CS 34 Computer Science Cornell University Slides by Anne Bracy with 34 slides by Professors Weatherspoon, Bala, ckee, and Sirer. See P&H Chapter: 5.-5.4, 5.8, 5., 5.3, 5.5,

More information

Computer Architectures. DLX ISA: Pipelined Implementation

Computer Architectures. DLX ISA: Pipelined Implementation Computer Architectures L ISA: Pipelined Implementation 1 The Pipelining Principle Pipelining is nowadays the main basic technique deployed to speed-up a CP. The key idea for pipelining is general, and

More information

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding

You Can Do That. Unit 16. Motivation. Computer Organization. Computer Organization Design of a Simple Processor. Now that you have some understanding .. ou Can Do That Unit Computer Organization Design of a imple Clou & Distribute Computing (CyberPhysical, bases, Mining,etc.) Applications (AI, Robotics, Graphics, Mobile) ystems & Networking (Embee ystems,

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

The Alpha and Microprocessors:

The Alpha and Microprocessors: The Alpha 21364 and 21464 icroprocessors: Continuing the Performance Lead Beyond Y2K Shubu ukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury,

More information

Computer Architecture s Changing Definition

Computer Architecture s Changing Definition Computer Architecture s Changing Definition 1950s Computer Architecture Computer Arithmetic 1960s Operating system support, especially memory management 1970s to mid 1980s Computer Architecture Instruction

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Computer Architecture. Introduction. Lynn Choi Korea University

Computer Architecture. Introduction. Lynn Choi Korea University Computer Architecture Introduction Lynn Choi Korea University Class Information Lecturer Prof. Lynn Choi, School of Electrical Eng. Phone: 3290-3249, 공학관 411, lchoi@korea.ac.kr, TA: 윤창현 / 신동욱, 3290-3896,

More information

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)

Chapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)

More information

The Pentium II/III Processor Compiler on a Chip

The Pentium II/III Processor Compiler on a Chip The Pentium II/III Processor Compiler on a Chip Ronny Ronen Senior Principal Engineer Director of Architecture Research Intel Labs - Haifa Intel Corporation Tel Aviv University January 20, 2004 1 Agenda

More information

Alex Milenkovich 1. CPE/EE 421 Microcomputers. CPE/EE 421 Microcomputers U A H U A H U A H. Instructor: Dr Aleksandar Milenkovic Lecture Notes S01

Alex Milenkovich 1. CPE/EE 421 Microcomputers. CPE/EE 421 Microcomputers U A H U A H U A H. Instructor: Dr Aleksandar Milenkovic Lecture Notes S01 CPE/EE 42 Microcomputers Instructor: Dr Aleksandar Milenkovic Lecture Notes S0 *Material used is in part developed by Dr. D. Raskovic and Dr. E. Jovanov CPE/EE 42/52 Microcomputers CPE/EE 42 Microcomputers

More information

omputer Design Concept adao Nakamura

omputer Design Concept adao Nakamura omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

EECS 322 Computer Architecture Improving Memory Access: the Cache

EECS 322 Computer Architecture Improving Memory Access: the Cache EECS 322 Computer Architecture Improving emory Access: the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/

More information

Ch 5: Designing a Single Cycle Datapath

Ch 5: Designing a Single Cycle Datapath Ch 5: esigning a Single Cycle path Computer Systems Architecture CS 365 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Memory path Input Output Today s Topic:

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor)

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor) Design Objectives of the 0.35µm Alpha 21164 Microprocessor (A 500MHz Quad Issue RISC Microprocessor) Gregg Bouchard Digital Semiconductor Digital Equipment Corporation Hudson, MA 1 Outline 0.35µm Alpha

More information

Typical Processor Execution Cycle

Typical Processor Execution Cycle Typical Processor Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Operand Fetch Locate and obtain operand data

More information

SimCore/Alpha Functional Simulator Version 1.0 : Simple and Readable Alpha Processor Simulator

SimCore/Alpha Functional Simulator Version 1.0 : Simple and Readable Alpha Processor Simulator SimCore/Alpha Functional Simulator Version 1.0 : Simple and Readable Alpha Processor Simulator Kenji KISE kis@is.uec.ac.jp Graduate School of Information Systems, University of Electro-Communications 2003-09-28

More information

CPE/EE 421 Microcomputers

CPE/EE 421 Microcomputers CPE/EE 421 Microcomputers Instructor: Dr Aleksandar Milenkovic Lecture Notes S01 *Material used is in part developed by Dr. D. Raskovic and Dr. E. Jovanov CPE/EE 421/521 Microcomputers 1 CPE/EE 421 Microcomputers

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 4: Pipelining Last Time in Lecture 3 icrocoding, an effective technique to manage control unit complexity, invented in era when logic

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

GRE Architecture Session

GRE Architecture Session GRE Architecture Session Session 2: Saturday 23, 1995 Young H. Cho e-mail: youngc@cs.berkeley.edu www: http://http.cs.berkeley/~youngc Y. H. Cho Page 1 Review n Homework n Basic Gate Arithmetics n Bubble

More information

CS 3410, Spring 2014 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.15

CS 3410, Spring 2014 Computer Science Cornell University. See P&H Chapter: , 5.8, 5.15 CS 34, Spring 4 Computer Science Cornell University See P& Chapter: 5.- 5.4, 5.8, 5.5 Code Stored in emory (also, data and stack) memory PC +4 new pc inst control extend imm B A compute jump/branch targets

More information

Alex Milenkovich 1 CPE 323. CPE 323 Introduction to Embedded Computer Systems: Introduction. Instructor: Dr Aleksandar Milenkovic Lecture Notes

Alex Milenkovich 1 CPE 323. CPE 323 Introduction to Embedded Computer Systems: Introduction. Instructor: Dr Aleksandar Milenkovic Lecture Notes CPE 323 CPE 323 Introduction to Embedded Computer Systems: Introduction Instructor: Dr Aleksandar Milenkovic Lecture Notes Syllabus textbook & other references grading policy important dates course outline

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 4: Pipelining. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 4: Pipelining Prof. Onur Mutlu Carnegie Mellon University Last Time Addressing modes Other ISA-level tradeoffs Programmer vs. microarchitect Virtual memory Unaligned

More information

THE MICROPROCESSOR Von Neumann s Architecture Model

THE MICROPROCESSOR Von Neumann s Architecture Model THE ICROPROCESSOR Von Neumann s Architecture odel Input/Output unit Provides instructions and data emory unit Stores both instructions and data Arithmetic and logic unit Processes everything Control unit

More information

Lecture 3 Machine Language. Instructions: Instruction Execution cycle. Speaking computer before voice recognition interfaces

Lecture 3 Machine Language. Instructions: Instruction Execution cycle. Speaking computer before voice recognition interfaces Lecture 3 Machine Language Speaking computer before voice recognition interfaces 1 Instructions: Language of the Machine More primitive than higher level languages e.g., no sophisticated control flow Very

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

CS 152, Spring 2011 Section 10

CS 152, Spring 2011 Section 10 CS 152, Spring 2011 Section 10 Christopher Celio University of California, Berkeley Agenda Stuff (Quiz 4 Prep) http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/ Intel

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

General Purpose Processors

General Purpose Processors Calcolatori Elettronici e Sistemi Operativi Specifications Device that executes a program General Purpose Processors Program list of instructions Instructions are stored in an external memory Stored program

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Performance COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals What is Performance? How do we measure the performance of

More information

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015

Caches and Memory Deniz Altinbuken CS 3410, Spring 2015 s and emory Deniz Altinbuken CS, Spring Computer Science Cornell University See P& Chapter:.-. (except writes) Big Picture: emory Code Stored in emory (also, data and stack) compute jump/branch targets

More information

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 4. Instruction-Level Parallelism II 第四讲 指令级并行 II Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Hazards (data/name/control) RAW, WAR, WAW hazards Different types

More information

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Pipelining Recall Pipelining is parallelizing execution Key to speedups in processors Split instruction

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Lecture 4: Instruction Set Architecture

Lecture 4: Instruction Set Architecture Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Digital Semiconductor. StrongARMARM

Digital Semiconductor. StrongARMARM 3TRONG!2-3!! -HZ B 7 #-/3!2-0ROCESSOR 3RIBALAN 3ANTHANAM $IGITAL %QUIPMENT #ORPORATION (OT #HIPS /VERVIEW u Highlights u Design choices u µarchitecture details u Powerdown Modes u Measured Results u Performance

More information

Case Study : Transmeta s Crusoe

Case Study : Transmeta s Crusoe Case Study : Transmeta s Crusoe Motivation David Ditzel (SUN microsystems) observed that Microprocessor complexity is getting worse, and they consume too much power. This led to the birth of Crusoe (nicknamed

More information

Chapter 2 Lecture 1 Computer Systems Organization

Chapter 2 Lecture 1 Computer Systems Organization Chapter 2 Lecture 1 Computer Systems Organization This chapter provides an introduction to the components Processors: Primary Memory: Secondary Memory: Input/Output: Busses The Central Processing Unit

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Logic Circuits for Register Transfer

More information

RISC, CISC, and ISA Variations

RISC, CISC, and ISA Variations RISC, CISC, and ISA Variations CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. iclicker

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Computer Architecture

Computer Architecture Computer Architecture Mehran Rezaei m.rezaei@eng.ui.ac.ir Welcome Office Hours: TBA Office: Eng-Building, Last Floor, Room 344 Tel: 0313 793 4533 Course Web Site: eng.ui.ac.ir/~m.rezaei/architecture/index.html

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

Hardware Speculation Support

Hardware Speculation Support Hardware Speculation Support Conditional instructions Most common form is conditional move BNEZ R1, L ;if MOV R2, R3 ;then CMOVZ R2,R3, R1 L: ;else Other variants conditional loads and stores nullification

More information

LECTURE 10. Pipelining: Advanced ILP

LECTURE 10. Pipelining: Advanced ILP LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls, returns) that changes the normal flow of instruction

More information

CPSC614: Computer Architecture

CPSC614: Computer Architecture CPSC614: Computer Architecture E.J. Kim Texas A&M University Computer Science & Engineering Department Assignment 1, Due Thursday Feb/9 Spring 2017 1. A certain benchmark contains 195,700 floating-point

More information

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1 Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Name Busy Op Vj Vk Qj Qk A Load1 no Load2 no Add1 Y Sub Reg[F2]

More information

Hardware Design I Chap. 10 Design of microprocessor

Hardware Design I Chap. 10 Design of microprocessor Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory

More information

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath Outline EEL-473 Computer Architecture Designing a Single Cycle path Introduction The steps of designing a processor path and timing for register-register operations path for logical operations with immediates

More information

Single-Cycle Examples, Multi-Cycle Introduction

Single-Cycle Examples, Multi-Cycle Introduction Single-Cycle Examples, ulti-cycle Introduction 1 Today s enu Single cycle examples Single cycle machines vs. multi-cycle machines Why multi-cycle? Comparative performance Physical and Logical Design of

More information

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model What is Computer Architecture? Structure: static arrangement of the parts Organization: dynamic interaction of the parts and their control Implementation: design of specific building blocks Performance:

More information

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved. + EKT 303 WEEK 13 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. + Chapter 15 Reduced Instruction Set Computers (RISC) Table 15.1 Characteristics of Some CISCs, RISCs, and Superscalar

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1996 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011

15-740/ Computer Architecture Lecture 7: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 15-740/18-740 Computer Architecture Lecture 7: Pipelining Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/26/2011 Review of Last Lecture More ISA Tradeoffs Programmer vs. microarchitect Transactional

More information

Simultaneous Multithreading (SMT)

Simultaneous Multithreading (SMT) Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995 by Dean Tullsen at the University of Washington that aims at reducing resource waste in wide issue

More information

Microprocessor Architecture Dr. Charles Kim Howard University

Microprocessor Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture ALU

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1 Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 8 General Purpose Processors - I Version 2 EE IIT, Kharagpur 2 In this lesson the student will learn the following Architecture

More information

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program. Performance COMP375 Computer Architecture and dorganization What is Good Performance Which is the best performing jet? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming Systems Design and Programming Instructor: Professor Jim Plusquellic Text: Barry B. Brey, The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor Architecture,

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

ISA and RISCV. CASS 2018 Lavanya Ramapantulu ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program

More information

OOO Execution and 21264

OOO Execution and 21264 OOO Execution and 21264 1 Parallelism ET = IC * CPI * CT IC is more or less fixed We have shrunk cycle time as far as we can We have achieved a CPI of 1. Can we get faster? 2 Parallelism ET = IC * CPI

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology Computer Organization MIPS Architecture Department of Computer Science Missouri University of Science & Technology hurson@mst.edu Computer Organization Note, this unit will be covered in three lectures.

More information

Systems Design and Programming. Instructor: Chintan Patel

Systems Design and Programming. Instructor: Chintan Patel Systems Design and Programming Instructor: Chintan Patel Text: Barry B. Brey, 'The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor, Pentium II, Pentium

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2) ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2) Victor P. Nelson, Professor & Asst. Chair Vishwani D. Agrawal, James J. Danaher Professor Department

More information

ELC4438: Embedded System Design Embedded Processor

ELC4438: Embedded System Design Embedded Processor ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information