Agenda. Pentium III Processor New Features Pentium 4 Processor New Features. IA-32 Architecture. Sunil Saxena Principal Engineer Intel Corporation

Size: px
Start display at page:

Download "Agenda. Pentium III Processor New Features Pentium 4 Processor New Features. IA-32 Architecture. Sunil Saxena Principal Engineer Intel Corporation"

Transcription

1 IA-32 Architecture Sunil Saxena Principal Engineer Corporation September 11, 2000 Copyright 2000 Corporation. Linux Supercluster Users Conference Agenda Pentium III Processor New Features Pentium 4 Processor New Features Pentium 4 Processor Micro-architecture Copyright 2000 Corporation. Linux Supercluster Users Conference Page 2 1

2 Performance IA Processor Roadmap Extends IA Headroom, Scalability and Availability for the Most Demanding Environments Pentium III Xeon processor Itanium TM processor Cascades Foster McKinley Future IA Madison IA-64 Perf Deerfield IA-64 Price/Perf... Outstanding Performance for 32 Bit Volume Apps µ.18µ.13µ Strong Execution on Itanium Processor, Continued Focus on the Long Term Copyright 2000 Corporation. Linux Supercluster Users Conference Page 3 Pentium III Processor Pentium III Processor New Features 36-bit Physical Addressing Physical Address Extension - PAE-36 Page Size Extensions - PSE-36 Page Attribute Table Fast Floating-point save/restore New Instructions New Exceptions Copyright 2000 Corporation. Linux Supercluster Users Conference Page 4 2

3 36-bit Addressing 36-bit Addressing PSE-36 PAE-36 PSE-36 4GB mapped through 4K of page directories and 4MB page tables Memory above 4 GB is only accessible as 4 MB pages Operating system can freely use both 4KB and 4MB pages without PDE P structure change All 4KB pages and page tables MUST reside below 4GB boundary Reduces effort needed to develop & support changes in virtual memory mory subsystem PAE-36 4GB mapped through 16K of page directories and 16MB page tables All Memory accessible as 4KB or 2MB pages OS needs to load PDEPTRs for mapping changes on writes to CR3 CONFIG_HIMEM to enable more than 4 GB physical memory Copyright 2000 Corporation. Linux Supercluster Users Conference Page 5 4KB Page Translation Linear Address PTE PDE 4-Kb page CR3 Page Directory 1024 Entries Page Table 1024 Entries Copyright 2000 Corporation. Linux Supercluster Users Conference Page 6 3

4 4KB PAE Translation Linear Address PDE PTE 4-Kb page CR3 PDPE Page Directory Pointer Table Page Directory 512 Entries Page Table 512 Entries Copyright 2000 Corporation. Linux Supercluster Users Conference Page 7 4MB Translation Linear Address PDE CR3 Provides bits of physical address of 4MB Page Page Directory 1024 Entries 4-MB page (Bits are currently RESERVED) Page Directory Entry Provides bits of physical address of 4MB Page (new) Copyright 2000 Corporation. Linux Supercluster Users Conference Page 8 4

5 Example Using PSE-36 ONLY 4MB PAGES ABOVE 4GB 8GB CR3 4-byte Entries 4MB Page 4K Page... 4MB Page Page Directory 4-byte Entries 4K Page 4K Page Page Table 4GB 4K & 4MB PAGES BELOW 4GB 0 Copyright 2000 Corporation. Linux Supercluster Users Conference Page 9 Physical Memory Page Attribute Table (PAT) Physical Memory Attributes Described through the Page-Tables Builds upon enhanced memory type capability provided via MTRR s in Pentium Pro processor Relaxes MTRR alignment/length requirements Builds upon PCD/PWT bits on IA-32 Architecture These interact with effective memory type determination PAT Architecture PAT is an 8-entry 8 table indexed via PCD, PWT, and Resv.. bits Allows up to 8 memory attributes defined by the page tables PAT is always enabled when Paging is used Default table entries fully compatible with PCD/PWT/Resv settings PAT entries R/W programmable via RDMSR/WRMSR (0x277) 8 bits per entry; 3 bits for attribute with other bits reserved Memory attributes as specified by Pentium Pro processor Copyright 2000 Corporation. Linux Supercluster Users Conference Page 10 5

6 Page Attribute Table (PAT) PAT Architecture (continued) PAT Memory Types interact with MTRRs As architecturally specified by Pentium Pro processor Implementation specific combinations remain undefined Should not be depended upon by system software Precautions OS Uses Page Directory as a Page Table: Restricted to using 4 lowest PAT entries PAT bit 7 in 4K PTE is PS bit when used as a PDE Memory type changes for pages require TLB invalidation Follow procedure as when changing MTRRs cache flush, TLB invalidation PAT entries on multiple processors must be maintained in consistent manner by OS All processors have same values in PAT Copyright 2000 Corporation. Linux Supercluster Users Conference Page 11 Page Attribute Table (PAT) Precautions (continued) Page Aliasing PAT maintains memory types according to linear addresses Architecture allows OS to map single physical page with 2 linear addresses containing differing types This may lead to undefined results and must be avoided PAT Uses Essentially unlimited MTRRs Provide support for more devices (frame buffers, RAID cards, etc ) to map memory as WC Allows map system memory for specific optimizations Memory shared with 3D accelerator/cpu for textures Reduce eviction, read-for-ownership bus transactions and cache thrashing for common operations such as memory fill Copyright 2000 Corporation. Linux Supercluster Users Conference Page 12 6

7 Fast Floating Point Save/Restore These instructions minimize cost of saving/restoring Floating Point/MMX Technology State Does NOT re-initialize the FPU state after saving Performance improvements come from more natural format and alignment of the cpu state State area is larger 512 bytes MUST be aligned on 16 byte boundary, else GP(0) fault Use of reserved fields risks incompatibility with future Architecture processors FXSAVE does not check for unmasked exceptions (i.e. like FNSAVE) FXRSTOR does not fault when loading an image that contains pending exceptions Copyright 2000 Corporation. Linux Supercluster Users Conference Page 13 Pentium III New Instructions Core Architecture Floating Point Arch. Multimedia Architecture Memory Architecture Pentium III MMX FP processors= Dynamic Technology Execution New Media SIMD FP Instr. P6 bus + WC I/O Streaming Mem Instr 52 New SIMD Single Precision Floating Point Instructions up to 4 FP results per cycle Eight 128 bit registers 4 x Single precision FP numbers 12 New Media Instructions 8 New Cacheability Instructions Copyright 2000 Corporation. Linux Supercluster Users Conference Page 14 7

8 Prefetching Instruction Prefetch gets a cacheline at a time Prefetch Hint (Load) Instructions Instructions do not fault Retires quickly to free up machine resources Hints to cache at different levels Store in different levels of cache hierarchy Don t store in the cache hierarchy (stream) Potential OS tuning uses e.g. TCP/IP Checksum gets ~2x speedup Copyright 2000 Corporation. Linux Supercluster Users Conference Page 15 Streaming Store Instruction Store data to memory minimizing cache pollution Potential OS tuning benefits 128 bit registers used with streaming store to zero pages ~4x faster mem copy ~2x faster using prefetch/stream together Copyright 2000 Corporation. Linux Supercluster Users Conference Page 16 8

9 New Exceptions Interrupt vector 19 used to invoke unmasked exception handlers Provide larger (512 bytes of state) context record to handler Handlers need to account for SIMD nature of Pentium III SSE numeric exceptions One instruction can generate multiple exceptions Exceptions are precise (reported when detected) Pentium III SSE Instructions architecturally separate from x87- FP Pentium III SSE Instructions do not report x87-fp/mmx FP/MMX Technology exceptions New handlers must include IEEE filter to decode and emulate exception raising SIMD instructions Copyright 2000 Corporation. Linux Supercluster Users Conference Page 17 Pentium 4 Processor Pentium 4 Processor New Features SSE2 Instructions Enhanced Prefetch Instructions System Bus and Cache Enhancements OS Recommendation New Instruction support Copyright 2000 Corporation. Linux Supercluster Users Conference Page 18 9

10 Pentium 4 Architecture Overview Willamette is the next generation IA-32 processor microarchitecture New micro-architecture ~1.4x average performance of Pentium III processor family on same process Enables faster processor speeds (1 GHz+) Trace Cache for Instruction Decode Willamette New Instructions New platform (chipsets, AGP4X) Copyright 2000 Corporation. Linux Supercluster Users Conference Page 19 Pentium 4 New Instructions New 128 bit arithmetic instructions Extend MMX technology instructions from 64 bit to 128 bit data type Operates on XMM registers instead of MMX/x87-FP registers New 128-bit integer and SIMD-Integer Integer instructions Memory operands MUST be 128-bit aligned! Will cause Exception during executions if not aligned. Packed 32 * 32 bit Multiply Packed 64 bit Add/Subtract Shift, Shuffle, Unpack, Move, Conversion New SIMD Double Precision FP instructions Full complement of FP arithmetic operations Packed/Scalar DP SP conversions New cache / memory management instructions Cache line flush instruction Fences (LFence( / MFence) New streaming store instructions Copyright 2000 Corporation. Linux Supercluster Users Conference Page 20 10

11 Streaming SIMD Extensions 2 Floating Point Registers (Scalar/packed SIMD-SP-FP, SIMD-DP-FP, 128-bit Integer) XMM Integer / x87 Registers (64-bit Integer, x87 data) FP0 or MM0. XMM7 FP7 or MM7 Copyright 2000 Corporation. Linux Supercluster Users Conference Page 21 Example SIMD Add (ADDPD( ADDPD) Effectively performs two double precision ops in one cycle a1+b1=c1 in parallel with a0+b0=c0 Useful for matrix operations S2 a1 a0 + + b1 b0 128-bit Registers c1 c0 Copyright 2000 Corporation. Linux Supercluster Users Conference Page 22 11

12 Prefetches Prefetches The Pentium 4 processor has automatic prefetches which Work on large buffers Have Sequential access Even fewer prefetches necessary Use sequential access to buffers and get prefetches for free Copyright 2000 Corporation. Linux Supercluster Users Conference Page 23 Prefetches But prefetch instructions may still be the best solution in some cases PrefetchNTA reduces cache evictions of useful data ( x 1.15x gain) Benefits unusual (ie( ie,, non-contiguous) data access patterns Can maximize read bandwidth to system memory Increase fetch-ahead distance since memory- latency/computation delta increases Copyright 2000 Corporation. Linux Supercluster Users Conference Page 24 12

13 System Bus & Cache Enhancements The Pentium 4 system bus is an evolutionary extension of the P6 bus 3.2 GByte/sec data transfer rate 100MHz quad pumped data bus - similar to AGP-4X Source synchronous 64 bit data bus Caches Trace cache for decoded instructions 128 byte cache lines with 64 byte sectors 256K on-die, 2nd level write-back, unified data and instruction cache APIC Messages now sent over front side bus Physical destination mode expanded to 8-bits8 ISR, IRR, TMR implementation increased to 256 bits Remote read is no longer supported Copyright 2000 Corporation. Linux Supercluster Users Conference Page 25 OS Recommendations All spin-loops should include the PAUSE instruction Backwards compatible with prior IA-32 processors Significant performance benefit in future IA-32 processors Already done in 2.4-test* kernels Cache line size is 128 bytes with 64 byte sectors Impact to hot locks Hot locks should be on separate 64 byte sectors Impact to data structure alignment 128 byte line allocation in cache Use Non-execution based Timing Loops! Already done in 2.4-test* kernels Copyright 2000 Corporation. Linux Supercluster Users Conference Page 26 13

14 Pentium 4 New Instruction Support FXSAVE/FXRSTOR support for Pentium 4 state Already done if enabled for Pentium III processor (Internet Streaming SIMD Extensions) No New State! Already done in 2.4-test* kernels New Exception Handlers Double Precision SIMD capable IEEE Compliant Prefetch and Streaming Store Optimizations Integer state streaming store instruction MOVNTi For zeroing, memcpy,, etc. Does not use FP state so DNA is avoided Copyright 2000 Corporation. Linux Supercluster Users Conference Page 27 Pentium 4 Processor Micro-architecture Next Generation IA-32 Micro-architecture Copyright 2000 Corporation. Linux Supercluster Users Conference 14

15 Agenda IA-32 Processor Roadmap Design Goals Frequency Instructions Per Cycle Summary Copyright 2000 Corporation. Linux Supercluster Users Conference Page 29 Pentium 4 Processor NetBurst Micro-Architecture Performance P6 Micro-Architecture P5 Micro-Architecture NOW 486 Micro-architecture Time Copyright 2000 Corporation. Linux Supercluster Users Conference Page 30 15

16 Pentium 4 Processor Design Goals Deliver world class performance across both existing and emerging applications Deliver performance headroom and scalability for the future Micro-architecture that that will will Drive Performance Leadership for for the the Next Several Years Copyright 2000 Corporation. Linux Supercluster Users Conference Page 31 CPU Architecture 101 Delivered Performance = Frequency * Instructions Per Cycle Frequency Copyright 2000 Corporation. Linux Supercluster Users Conference Page 32 16

17 Frequency What limits frequency? Process technology Microarchitecture On a given process technology Fewer gates per pipeline stage will deliver higher frequency Frequency is is driven by Microarchitecture Copyright 2000 Corporation. Linux Supercluster Users Conference Page 33 Netburst TM Micro-architecture Pipeline vs P Fetch Fetch Decode Basic P6 Pipeline Decode Decode Rename ROB Rd ROB Rd Rdy/Sch Dispatch Basic Pentium 4 Processor Pipeline TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp Exec RF RF Intro at 733MHz.18µ Intro at 1.4GHz Ex.18µ Flgs Br Ck Drive Hyper pipelined Technology enables industry leading performance and clock rate Copyright 2000 Corporation. Linux Supercluster Users Conference Page 34 17

18 Hyper Pipelined Technology Frequency Today 1.4GHz 1.13GHz Netburst Micro-Architecture P6 Micro-Architecture 166MHz 60MHz Introduction Time Copyright 2000 Corporation. Linux Supercluster Users Conference Page MHz 5 P5 Micro-Architecture CPU Architecture 101 Delivered Performance = Frequency * Instructions Per Cycle Instructions Per Cycle Copyright 2000 Corporation. Linux Supercluster Users Conference Page 36 18

19 Improving Instructions Per Cycle Improve efficiency Branch prediction Do more things in a clock Reduce time it takes to do something Reducing latency Copyright 2000 Corporation. Linux Supercluster Users Conference Page 37 Improving Instructions Per Cycle Improve efficiency Branch prediction Do more things in a clock Reduce time it takes to do something Reducing latency Copyright 2000 Corporation. Linux Supercluster Users Conference Page 38 19

20 Branch Prediction Accurate branch prediction is key to enabling longer pipelines Dramatic improvement over P6 branch predictor: 8x the size (4K) Eliminated 1/3 of the mispredictions Proven to be better than all other publicly disclosed predictors (g-share, hybrid, etc) Copyright 2000 Corporation. Linux Supercluster Users Conference Page 39 Execution Trace Cache Advanced L1 instruction cache Caches decoded IA-32 instructions (uops) Removes decoder pipeline latency Capacity is ~12K uops Integrates branches into single line Follows predicted path of program execution Execution Trace Cache feeds fast engine Copyright 2000 Corporation. Linux Supercluster Users Conference Page 40 20

21 Execution Trace Cache 1 cmp 2 br -> > T (unused code) T1: 3 sub 4 br -> > T (unused code) T2: 5 mov 6 sub 7 br -> > T (unused code) T3: 8 add 9 sub 10 mul 11 cmp 12 br -> > T4 Trace Cache Delivery 1 cmp 2 br T1 3 T1: sub 4 br T2 5 mov 6 sub 7 br T3 8 T3:add 9 sub 10 mul 11 cmp 12 br T4 Copyright 2000 Corporation. Linux Supercluster Users Conference Page 41 Advanced Dynamic Execution Extends basic features found in P6 core Very deep speculative execution 126 instructions in flight (3x P6) 48 loads (3x P6) and 24 stores (2x P6) Provides larger window of visibility Better use of execution resources Deep Speculation Improves Parallelism Copyright 2000 Corporation. Linux Supercluster Users Conference Page 42 21

22 Improving Instructions Per Cycle Improve efficiency Branch prediction Do more things in a clock Reduce time it takes to do something Reducing latency Copyright 2000 Corporation. Linux Supercluster Users Conference Page 43 Rapid Execution Engine Dramatically lower ALU latency P6: 11 1GHz P4P: ½ ½ >1.4GHz 1ns <0.36ns Copyright 2000 Corporation. Linux Supercluster Users Conference Page 44 22

23 Example with Higher IPC and Faster Clock! Code Sequence Ld Add Add Ld Add Add 10 clocks 10ns IPC = 0.6 Pentium 4 6 clocks 4.3ns IPC = 1.0 Copyright 2000 Corporation. Linux Supercluster Users Conference Page 45 Recap Frequency Adder Speed L1 Cache Speed L1 Cache Size L1 Cache Bandwidth L2 Cache Bandwidth Uop Fetch Bandwidth Adder Bandwidth Branch targets Instructions In flight Loads in flight Stores in flight Pentium III Processor 1 GHz 1 ns 3 ns 16 KB 16 GB/sec 16 GB/sec 3 billion/sec 2 billion/sec Pentium 4 Processor > 1.4 Ghz <.36 ns < 1.42 ns 8 KB > 44.8 GB/sec > 44.8 GB/sec > 4.2 billion/sec > 5.6 billion/sec Relative Improvement > 1.4 > 2.8 > > 2.8 > 2.8 > 1.4 > 2.8 Copyright 2000 Corporation. Linux Supercluster Users Conference Page

24 Example - Security and e Security and e-commerce Secure transactions enable e-commercee SSL is the standard for secure Web transactions Protocol for secure communication Built upon a core set of algorithms Public-key encryption RSA, DSA, Diffie-Hellman, etc. Message digest SHA-1, MD5, etc. Digital signature Bulk encryption RC4, DES, 3DES, AES Copyright 2000 Corporation. Linux Supercluster Users Conference Page 47 Security Impacts Performance SSL - The basics Browser Web Server Client Hello(settings,etc) Server Hello (certificate, suite,etc) Pre-master secret key Session ID, Ready Session ID, Ready Data exchanges (bulk encryption) Copyright 2000 Corporation. Linux Supercluster Users Conference Page 48 24

25 Security Impacts Performance The high cost of SSL Source - Lab research Transaction time Kbytes Transmitted Secure transactions are orders of of magnitude slower than non-secure Copyright 2000 Corporation. Linux Supercluster Users Conference Page 49 Identify Key Algorithms Computation in SSL Goal: Increase the number of secure transactions Identify server performance issues in SSL One server may deal with hundreds of clients Montgomery Product setup Authentication Bulk Data Encryption close 70% Compute time consumed by one short SSL transaction Copyright 2000 Corporation. Linux Supercluster Users Conference Page 50 Source - Lab research 25

26 Breakthrough Performance on Pentium 4 Processor Architectural Features New instructions in SSE2 PMULUDQ (32x32=>64) PADDQ (64+64=>64) PSHUFD (Re-arrange DWORDs) All pipelined SIMD Increase size and reduce number of individual multiplications Copyright 2000 Corporation. Linux Supercluster Users Conference Page 51 Breakthrough Performance on Pentium 4 Processor Timings Algorithm Bits Lang Clocks Ratio Naïve 32-bit 1x16 C Optimized ASM using MUL 1x32 asm Using Pentium 4 New Instruct2x32 asm Almost 20x performance gain versus naïve implementation Copyright 2000 Corporation. Linux Supercluster Users Conference Page 52 26

27 Breakthrough Performance on Pentium 4 Processor Summary Expect bit RSA Decrypts/second Breakthrough performance on public key algorithms for Pentium 4 processor The right architecture The right instruction set Pentium 4 processor delivers more secure transactions to to more users Copyright 2000 Corporation. Linux Supercluster Users Conference Page 53 27

Next Generation Technology from Intel Intel Pentium 4 Processor

Next Generation Technology from Intel Intel Pentium 4 Processor Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business

More information

Pentium 4 Processor Block Diagram

Pentium 4 Processor Block Diagram FP FP Pentium 4 Processor Block Diagram FP move FP store FMul FAdd MMX SSE 3.2 GB/s 3.2 GB/s L D-Cache and D-TLB Store Load edulers Integer Integer & I-TLB ucode Netburst TM Micro-architecture Pipeline

More information

How to write powerful parallel Applications

How to write powerful parallel Applications How to write powerful parallel Applications 08:30-09.00 09.00-09:45 09.45-10:15 10:15-10:30 10:30-11:30 11:30-12:30 12:30-13:30 13:30-14:30 14:30-15:15 15:15-15:30 15:30-16:00 16:00-16:45 16:45-17:15 Welcome

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs

High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs High-Performance Microarchitecture Techniques John Paul Shen Director of Microarchitecture Research Intel Labs October 29, 2002 Microprocessor Research Forum Intel s Microarchitecture Research Labs! USA:

More information

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2

Instruction Set Progression. from MMX Technology through Streaming SIMD Extensions 2 Instruction Set Progression from MMX Technology through Streaming SIMD Extensions 2 This article summarizes the progression of change to the instruction set in the Intel IA-32 architecture, from MMX technology

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

Exploring the Effects of Hyperthreading on Scientific Applications

Exploring the Effects of Hyperthreading on Scientific Applications Exploring the Effects of Hyperthreading on Scientific Applications by Kent Milfeld milfeld@tacc.utexas.edu edu Kent Milfeld, Chona Guiang, Avijit Purkayastha, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER

More information

This Material Was All Drawn From Intel Documents

This Material Was All Drawn From Intel Documents This Material Was All Drawn From Intel Documents A ROAD MAP OF INTEL MICROPROCESSORS Hao Sun February 2001 Abstract The exponential growth of both the power and breadth of usage of the computer has made

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture,

More information

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division thomas.siebold@hp.com Agenda Terminology What is the Itanium Architecture? 1 Terminology Processor Architectures

More information

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals

SSE and SSE2. Timothy A. Chagnon 18 September All images from Intel 64 and IA 32 Architectures Software Developer's Manuals SSE and SSE2 Timothy A. Chagnon 18 September 2007 All images from Intel 64 and IA 32 Architectures Software Developer's Manuals Overview SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

XT Node Architecture

XT Node Architecture XT Node Architecture Let s Review: Dual Core v. Quad Core Core Dual Core 2.6Ghz clock frequency SSE SIMD FPU (2flops/cycle = 5.2GF peak) Cache Hierarchy L1 Dcache/Icache: 64k/core L2 D/I cache: 1M/core

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

1. PowerPC 970MP Overview

1. PowerPC 970MP Overview 1. The IBM PowerPC 970MP reduced instruction set computer (RISC) microprocessor is an implementation of the PowerPC Architecture. This chapter provides an overview of the features of the 970MP microprocessor

More information

Hardware-Based Speculation

Hardware-Based Speculation Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register

More information

PowerPC 740 and 750

PowerPC 740 and 750 368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Intel Core Microarchitecture

Intel Core Microarchitecture Intel Core Microarchitecture Marco Morosini 651191 Matteo Larocca 680089 AY 2005/2006 Multimedia System Architectures Presentation Outlook New solutions for old problems Architecture Overview Architecture

More information

Pentium IV-XEON. Computer architectures M

Pentium IV-XEON. Computer architectures M Pentium IV-XEON Computer architectures M 1 Pentium IV block scheme 4 32 bytes parallel Four access ports to the EU 2 Pentium IV block scheme Address Generation Unit BTB Branch Target Buffer I-TLB Instruction

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of four volumes: Basic Architecture, Order Number

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer's Manual consists of five volumes: Basic Architecture, Order Number

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

BOBCAT: AMD S LOW-POWER X86 PROCESSOR

BOBCAT: AMD S LOW-POWER X86 PROCESSOR ARCHITECTURES FOR MULTIMEDIA SYSTEMS PROF. CRISTINA SILVANO LOW-POWER X86 20/06/2011 AMD Bobcat Small, Efficient, Low Power x86 core Excellent Performance Synthesizable with smaller number of custom arrays

More information

Inside Intel Core Microarchitecture

Inside Intel Core Microarchitecture White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 9: Limits of ILP, Case Studies Lecture Outline Speculative Execution Implementing Precise Interrupts

More information

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro)

Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers) physical register file that is the same size as the architectural registers

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.

Hyperthreading 3/25/2008. Hyperthreading. ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01. Hyperthreading ftp://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf Hyperthreading is a design that makes everybody concerned believe that they are actually using

More information

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department IA-32 Architecture COE 205 Computer Organization and Assembly Language Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Basic Computer Organization Intel

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Using Intel Streaming SIMD Extensions for 3D Geometry Processing

Using Intel Streaming SIMD Extensions for 3D Geometry Processing Using Intel Streaming SIMD Extensions for 3D Geometry Processing Wan-Chun Ma, Chia-Lin Yang Dept. of Computer Science and Information Engineering National Taiwan University firebird@cmlab.csie.ntu.edu.tw,

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Page 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400)

Page 1. Review: Dynamic Branch Prediction. Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400) CS252 Graduate Computer Architecture Lecture 18: ILP and Dynamic Execution #3: Examples (Pentium III, Pentium 4, IBM AS/400) April 4, 2001 Prof. David A. Patterson Computer Science 252 Spring 2001 Lec

More information

HW1 Solutions. Type Old Mix New Mix Cost CPI

HW1 Solutions. Type Old Mix New Mix Cost CPI HW1 Solutions Problem 1 TABLE 1 1. Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book problem), consider a strength-reducing optimization that converts multiplies by

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture ( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Intel 64 and IA-32 Architectures Optimization Reference Manual

Intel 64 and IA-32 Architectures Optimization Reference Manual N Intel 64 and IA-32 Architectures Optimization Reference Manual Order Number: 248966-015 May 2007 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Intel 64 and IA-32 Architectures Optimization Reference Manual

Intel 64 and IA-32 Architectures Optimization Reference Manual N Intel 64 and IA-32 Architectures Optimization Reference Manual Order Number: 248966-014 November 2006 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

CS 333 Introduction to Operating Systems Class 2 OS-Related Hardware & Software The Process Concept

CS 333 Introduction to Operating Systems Class 2 OS-Related Hardware & Software The Process Concept CS 333 Introduction to Operating Systems Class 2 OS-Related Hardware & Software The Process Concept Jonathan Walpole Computer Science Portland State University 1 Lecture 2 overview OS-Related Hardware

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

2.5 Address Space. The IBM 6x86 CPU can directly address 64 KBytes of I/O space and 4 GBytes of physical memory (Figure 2-24).

2.5 Address Space. The IBM 6x86 CPU can directly address 64 KBytes of I/O space and 4 GBytes of physical memory (Figure 2-24). Address Space 2.5 Address Space The IBM 6x86 CPU can directly address 64 KBytes of I/O space and 4 GBytes of physical memory (Figure 2-24). Memory Address Space. Access can be made to memory addresses

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number

More information

Concurrent High Performance Processor design: From Logic to PD in Parallel

Concurrent High Performance Processor design: From Logic to PD in Parallel IBM Systems Group Concurrent High Performance design: From Logic to PD in Parallel Leon Stok, VP EDA, IBM Systems Group Mainframes process 30 billion business transactions per day The mainframe is everywhere,

More information

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1

Module 2. Embedded Processors and Memory. Version 2 EE IIT, Kharagpur 1 Module 2 Embedded Processors and Memory Version 2 EE IIT, Kharagpur 1 Lesson 8 General Purpose Processors - I Version 2 EE IIT, Kharagpur 2 In this lesson the student will learn the following Architecture

More information

The Pentium II/III Processor Compiler on a Chip

The Pentium II/III Processor Compiler on a Chip The Pentium II/III Processor Compiler on a Chip Ronny Ronen Senior Principal Engineer Director of Architecture Research Intel Labs - Haifa Intel Corporation Tel Aviv University January 20, 2004 1 Agenda

More information

Semester paper for CSE 3322, Fall Memory Hierarchies. vs. By : Login : Date : Nov 8 th, Director: Professor Al-Khaiyat TA : Mr.

Semester paper for CSE 3322, Fall Memory Hierarchies. vs. By : Login : Date : Nov 8 th, Director: Professor Al-Khaiyat TA : Mr. Memory Hierarchies vs. By : Login : Date : Nov 8 th, 1999 Director: Professor Al-Khaiyat TA : Mr. Byung Sung 1 Introduction: As a semester paper for computer sciences architecture course, this paper describe

More information

Portland State University ECE 587/687. The Microarchitecture of Superscalar Processors

Portland State University ECE 587/687. The Microarchitecture of Superscalar Processors Portland State University ECE 587/687 The Microarchitecture of Superscalar Processors Copyright by Alaa Alameldeen and Haitham Akkary 2011 Program Representation An application is written as a program,

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model. Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)

Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) 18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures

More information

Limitations of Scalar Pipelines

Limitations of Scalar Pipelines Limitations of Scalar Pipelines Superscalar Organization Modern Processor Design: Fundamentals of Superscalar Processors Scalar upper bound on throughput IPC = 1 Inefficient unified pipeline

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

IA-32 Intel Architecture Optimization

IA-32 Intel Architecture Optimization IA-32 Intel Architecture Optimization Reference Manual Issued in U.S.A. Order Number: 248966-009 World Wide Web: http://developer.intel.com INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale

Datapoint 2200 IA-32. main memory. components. implemented by Intel in the Nicholas FitzRoy-Dale Datapoint 2200 IA-32 Nicholas FitzRoy-Dale At the forefront of the computer revolution - Intel Difficult to explain and impossible to love - Hennessy and Patterson! Released 1970! 2K shift register main

More information

5008: Computer Architecture

5008: Computer Architecture 5008: Computer Architecture Chapter 2 Instruction-Level Parallelism and Its Exploitation CA Lecture05 - ILP (cwliu@twins.ee.nctu.edu.tw) 05-1 Review from Last Lecture Instruction Level Parallelism Leverage

More information

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

More information

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation

Donn Morrison Department of Computer Science. TDT4255 ILP and speculation TDT4255 Lecture 9: ILP and speculation Donn Morrison Department of Computer Science 2 Outline Textbook: Computer Architecture: A Quantitative Approach, 4th ed Section 2.6: Speculation Section 2.7: Multiple

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 12: Hardware Assisted Software ILP and IA64/Itanium Case Study Lecture Outline Review of Global Scheduling,

More information

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013

Computer Architecture Lecture 14: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 18-447 Computer Architecture Lecture 14: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/18/2013 Reminder: Homework 3 Homework 3 Due Feb 25 REP MOVS in Microprogrammed

More information

The Pentium Processor

The Pentium Processor The Pentium Processor Chapter 7 S. Dandamudi Outline Pentium family history Pentium processor details Pentium registers Data Pointer and index Control Segment Real mode memory architecture Protected mode

More information

Virtual Memory. Virtual Memory

Virtual Memory. Virtual Memory Virtual Memory Virtual Memory Main memory is cache for secondary storage Secondary storage (disk) holds the complete virtual address space Only a portion of the virtual address space lives in the physical

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Memory management units

Memory management units Memory management units Memory management unit (MMU) translates addresses: CPU logical address memory management unit physical address main memory Computers as Components 1 Access time comparison Media

More information

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University

EE382A Lecture 7: Dynamic Scheduling. Department of Electrical Engineering Stanford University EE382A Lecture 7: Dynamic Scheduling Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 7-1 Announcements Project proposal due on Wed 10/14 2-3 pages submitted

More information

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Hercules ARM Cortex -R4 System Architecture. Processor Overview Hercules ARM Cortex -R4 System Architecture Processor Overview What is Hercules? TI s 32-bit ARM Cortex -R4/R5 MCU family for Industrial, Automotive, and Transportation Safety Hardware Safety Features

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information

Portland State University ECE 588/688. IBM Power4 System Microarchitecture

Portland State University ECE 588/688. IBM Power4 System Microarchitecture Portland State University ECE 588/688 IBM Power4 System Microarchitecture Copyright by Alaa Alameldeen 2018 IBM Power4 Design Principles SMP optimization Designed for high-throughput multi-tasking environments

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information