Professor Lee, Yong Surk. References 고성능마이크로프로세서구조의개요. Topics Microprocessor & microcontroller

Size: px

Start display at page:

Download "Professor Lee, Yong Surk. References 고성능마이크로프로세서구조의개요. Topics Microprocessor & microcontroller"

Roxanne Fitzgerald
5 years ago
Views:

이강좌는 C & S Technology 사의지원으로제작되었으며 copyright가없으므로비영리적인목적에한하여누구든지복사, 배포가가능합니다. 연구실홈페이지에는고성능마이크로프로세서에관련된많은강좌가있으며누구나무료로다운로드받을수있습니다. Professor Lee, Yong Surk 1973 : B.S., Electrical Eng., Yonsei niv.

of ichigan, Ann Arbor 1982 ~ 1992 : Designe microprocessors in silicon valley, California Designe Pentium at Intel (1989 ~ 1992) 1993 ~ : Professor at Yonsei niversity 1 2 고성능마이크로프로세서구조의개요 (High

1 이강좌는 C & S Technology 사의지원으로제작되었으며 copyright가없으므로비영리적인목적에한하여누구든지복사, 배포가가능합니다. 연구실홈페이지에는고성능마이크로프로세서에관련된많은강좌가있으며누구나무료로다운로드받을수있습니다. Professor Lee, Yong Surk 1973 : B.S., Electrical Eng., Yonsei niv : Ph.D,, niv. of ichigan, Ann Arbor 1982 ~ 1992 : Designe microprocessors in silicon valley, California Designe Pentium at Intel (1989 ~ 1992) 1993 ~ : Professor at Yonsei niversity 1 2 고성능마이크로프로세서구조의개요 (High Performance icroprocessor Architecture Overview) 연세대학교전기전자공학과 이용석교수 Homepage: yonglee@yonsei.ac.kr 전화 : References [1] J.L.Hennessy & D.A.Patterson, Computer Architecture, a Quantitative Approach, Secon Eition, organ Kaufmann Publishers, 1996 [2] N.Alexanriis, Design of icroprocessor Base Systems, Prentice Hall, [3] D. Sima,, T. Fountain, P. Kacsuk, Avance Computer Architectures, a Design Space Approach, Aison - esley, 1997 [4] IEEE Stanar Committee, IEEE Stanar for Binary Floating-Point Arithmetic, ANSI / IEEE St [5] icroprocessor Report,.PRONLINE.CO 5 Topics icroprocessor & microcontroller & C Superscalar & VLI Pipelining Branch strategy Cache memory / Floating point unit ing Top-own esign 6

2 µ P Classification Performance Low eium High 4, 8 bits 16, 32 bits 32, 64 bits transistors Below a few tens of 1000s Above a few million Zilog Z-80, Intel 8051 AR, Hitachi SH Pentium, ltrasparc, Alpha 7 µp P vs. µc C (1) Data with freq Architecture Cache,, floating point unit, pipelining Design methoology icro - processor 32, 64 bits 4, 8, 16, 32 bits Above a few hunre Hz (Harvar) se Top - own icro- controller Below a few hunre Hz (Von Neumann) Not use Top own Bottom -up 8 µp P vs. µc C (2) ultiplier Divier icro - processor Booth multiplier SRT ivier icro - controller ses AL ses AL (Complex Set Computer) Transistor Application Above a few million Computer CP Below a few million Controller (Reuce Set Computer) Cost High Low 9 10 vs. (1) (Ref.[1, 2]) instruction length execution time Pipelining (Pentium) A few hunres Variable ( bytes) Variable (1-300 clk/ inst) Not efficient A few tens Fixe (44 bytes) Fixe (1 clk/ inst) Efficient 11 vs. (2) Control registers Data memory access Design effort (Pentium) icrocoe Small ( about 8) emory operans 5-10 X 600(man man- year) Harwire Large ( ) Loa, store only 1 X 60(man man- year) 12

3 em = Temp 1 Temp 2 em R 1 +em em R 1 +Temp 1 Temp 2 R 2 R 3 em em R 1 + R 2 R3 microinstruction = instruction 13 Intel Pentium4 () Prefix 0 4 Op coe o R / SIB (bytes) Ar isp Imm ata ~17 byte length 14 Alpha () icroprocessor Spee (per task) AL 31 0 Opcoe Ra Rb SBZ 0 Function Rc = (NI( ) X (CPI( ) X C Branch Opcoe Ra emor_isp NI : execute s Loa, Store Floating point Opcoe Opcoe Ra Fa Rb Fb emor_isp Function Fc 15 CPI : s Per C : perio (=1/ clock frequency) 16 NI ( execute Inst ) CPI ( Per Inst) C (1/ frequency) Spee 1X 1X - 300X 1X - 2X Slow 2X 1X 1X Fast 17 Benchmark IPS (illion Inst Per Secon) VAX IPS - VAX 11/780 Dhrystone, hetstone, Linpack - benchmark engineering SPEC int - 89, 92, 95, 00, 04 SPEC SPEC fp - 89, 92, 95, 00, 04 (.SPEC.ORG) 18

4 Alpha 21264C IPS R14000 Sun ltra- Sparc-3 Intel Pentium-4 Five Stage Pipeline (Ref.[1, 2]) freq. transistor Issue rate SPEC00b (int/ fp) Power consumption 1000Hz 500Hz 1000Hz 15.4m 7.2m 29m 2000Hz 42m / / / / icroprocessor Report, December, 2001 (.PRONLINE.CO) F : Inst fetch (inst access) & inc D : Inst ecoe & regrea E : AL op or ar calculation : Data access : Reg writeback 20 Alpha Pipeline F D Arr Iss E1 E2 Scalar µp F : Fetch D : Decoe Arr : Arrange to issue Issue : Issue & reg rea E 1 : AL operation E 2 : Data access : rite back 21 I #1 I #2 I #3 I #4 I #5 1 inst per clock (I) 22 Superscalar µp I #1 I #2 I #3 I #4 I #5 I #6 2 inst per clock (I) 23 Superscalar Data epenency R 1 R 2 + R 3 R 5 R 1 + R 4 Compiler minimizes ata epenency Dynamic scheuling by harware ost superscalar µps can issue 3 6 instructions maximum per cycle Compatibility 24

5 VLI (Very Long or) µp Loa Store A Compare FP a FP mult Branch em #1 em #2 Int AL Int AL FP AL FP mult ultiporte register file Branch unit 25 VLI (Ref. [3], p176) Static scheuling by compiler Complex compiler, simple harware No compatibility Cannot program in assembly language 26 Scalar µp Havar Architecture I #1 I #2 I #3 I #4 I #5 F D E emory access 27 CP Inst bus Data bus Inst memory Data memory 28 Von Neumann Architecture Sub R 3, R 1, R 2 ; R 3 R 1 - R 2 CP Inst bus Data bus nifie memory F : Inst fetch & inc D : Inst ecoe & R 1, R 2 rea E : Temp R 1 - R 2, status available Two - port memory : No op : : R 3 temp ; write status reg 29 30

6 Loa R 3, R 1, R 2 ; R 3 [R 1 + R 2 ] Store R 1, R 2, R 3 ; [R 1 + R 2 ] R 3 F : Inst fetch & inc F : Inst fetch & inc D : Inst ecoe & R 1, R 2 rea D : Inst ecoe & R 1, R 2, R 3 rea E : R 1 + R 2 ; ar calculation E : R 1 + R 2 ; ar calculation : [R 1 + R 2 ] ; ata rea : [R 1 + R 2 ] R3 : R 3 [R 1 + R 2 ] 31 : No op ; ata write 32 Br 85 ; Branch to + 85 F : Inst fetch & inc + 4 X FF FF FF FF A BR ADDR D : Inst ecoe & + 85 calculation E : + 85 ; branch : No op Inst D e c o e R E G X A L Data X : No op Ege Triggere S Flip-Flop Flop Inst Cache iss Cache miss penalty D Q D FF Q I #1 I #2 I #3 I #4 I #5 Inst miss 35 36

7 Loa Interlock Frozen R 1 [R 2 +85] Bypass In 0 X 1 D Q Flip flop Out Avance R 4 R 1 + R 3 XXX Freeze 37 0 : Freeze 1 : Avance 38 In D Q Flip flop Out Data Bypassing (Forwaring) AND Freeze Freeze 1 cycle frozen R 1 R 2 +R 3 R 5 R 1 -R 4 Bypass FF FF FF FF X Inst D e c o e A R E G BR ADDR X A L Bypass Data X 41 Branch Strategy (Ref.[1]) BTB (Branch Target Buffer) Delaye branch Preict - not - taken 42

8 Delaye Branch Branch Delay slot Target fetch F D E F D E Target aress F D E Compiler has to fill the elay slot 43 Preict - not - taken ntaken branch i + 1 i + 2 Taken branch F i + 1 Target aress Branch target fetch Flushe! XXX XXX XXX XXX 44 µp P System Fast CP Reg file SRA Cache mem DRA ain mem Slow H.D. 8 reg 512KB 256B 30GB icroprocessor Frequency (Perio) 10Hz(100nS) 100Hz(10nS) DRA Access Time 100nS 50nS Block size (16 64B) Page size (2K 8KB) Hz(1nS) 25nS 46 Aress Translation 2 32 = 4G 4KB 4KB 0 Virtual (logical) aress emory anagement nit KB 4KB Physical aress 47 IEEE Floating Point Stanar (Ref. [4]) Single precision Double precision bits S E F bits S E F υ = (-1)( S. e 2 (1.F), e = E - bias 48

9 0 Integer < < Single precision Double precision < ing 1 GHz clock (= 1nS perio) 30cm 2cm 2cm µp P chip FF 1 FF 2 1nS t 3cm metal ( t=0.1ns) Top-Down Design Steps 1. Simulator (C) 20% 2. HDL moel 20% 3. Verification 20% 4. Synthesis, full custom 20% 5. Verification 20% Total 100% 6. Fault graing aitional 20% 51

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Funamentals of Computer Systems an Engineering Fall 017 Datapaths Prof. John Boar Duke University Slies are erive from work by Profs. Tyler Bletch an Anrew Hilton (Duke) an Amir Roth (Penn) What