A 1.5GHz Third Generation Itanium Processor

Size: px
Start display at page:

Download "A 1.5GHz Third Generation Itanium Processor"

Transcription

1 A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1

2 Outline Processor highlights Process technology details Itanium processor evolution Block diagram Cache circuit design details Package details Front-side bus interface Clock distribution Power dissipation RAS, DFT and DFM features Frequency shmoo Summary 2

3 130nm Itanium 2 Processor Highlights 410M transistors 374mm 2 die size 6MB on-die L3 cache 1.5GHz at 1.3V 6.4GB/s 400MT/s 4-way bus interface Plug-in compatible with existing platforms Extensive RAS, DFT and DFM features Largest microprocessor transistor count and on-die cache 3

4 130nm Process Characteristics Attribute Lgate M1 pitch M2 pitch M3 pitch M4 pitch M5 pitch M6 pitch Dielectric Memory cell Value 60nm 350nm 448nm 448nm 756nm 1120nm 1204nm FSG, K= µm 2 S. Thompson et al, IEDM 2001 (top) S. Tyagi, et al, IEDM 2000 (bottom) 4

5 Itanium Processor Evolution Attribute Architecture Process Device Count On-die L3 cache Frequency Itanium Processor Explicitly Parallel Instruction Computing 180nm 25M 0* 800MHz Itanium 2 Processor 180nm 221M 3MB 1.0GHz This work 130nm 410M 6MB 1.5GHz Supply Voltage 1.6V 1.5V 1.3V Power 130W* 130W 130W * Includes 4MB cache on cartridge 5

6 Block Diagram L3 Cache ECC ECC ECC L2 Cache Quad Port ECC Branch Prediction 11 Issue Ports Scoreboard, Predicate, NATs, Exceptions L1 Instruction Cache and Fetch/Pre-fetch Engine Instruction Queue ITLB B B B M M M M I I F F Register Stack Engine / Re-Mapping Branch & Predicate Registers Branch Units IA-32 Decode and Control 128 Integer Registers 128 FP Registers Integer and MM Units 8 bundles Quad-Port L1 Data Cache and DTLB ALAT Floating Point Units ECC ECC Bus (128b data, 6

7 Cache Summary Attribute L1I L1D L2 L3 Size 16K 16K 256K 6M Line Size 64B 64B 128B 128B Ways Replacement LRU NRU NRU NRU Latency 1-Fetch:1 INT:1 FP: NA INT: 5 FP: 6 14 Write Policy - WT (RA) WB (WA) WB (WA) Bandwidth R: 48GBs R: 24GBs W: 24GBs R: 48GBs W: 48GBs R: 48GBs W: 48GBs Cache bandwidth increased by 50% L3 cache latency increased to 14 clocks and set associativity doubled 7

8 L1 Instruction Cache Circuit Detail p0wl p1wl p1_local_bl p0_local_bl p1_global_bl p0_global_bl p1_write_data write_en# fill_cycle_1 p0_write_data coreclock fill_cycle_1 write_en# p1_local_bl write-1 read-2 read-3 p0_local_bl read-1 write-1 8

9 L1 Instruction Cache Circuit Detail p0wl p1wl p1_local_bl p0_local_bl p1_global_bl p0_global_bl p1_write_data write_en# fill_cycle_1 p0_write_data coreclock fill_cycle_1 write_en# p1_local_bl write-1 read-2 read-3 p0_local_bl read-1 write-1 9

10 L1 Instruction Cache Circuit Detail p0wl p1wl p1_local_bl p0_local_bl p1_global_bl p0_global_bl p1_write_data write_en# fill_cycle_1 p0_write_data coreclock fill_cycle_1 write_en# p1_local_bl write-1 read-2 read-3 p0_local_bl read-1 write-1 10

11 Way[11:0] BLOCK7 BLOCK6 Way[23:12] BLOCK7 BLOCK6 Decoders Sense amplifiers Timers L3 Subarray BLOCK5 BLOCK5 Core BLOCK4 BLOCK4 1588u BLOCK3 I/O mux BLOCK3 L3 Cache BLOCK2 BLOCK1 BLOCK0 BLOCK2 BLOCK1 BLOCK0 L3 cache contains 140 subarrays tiled to fit irregular shape of core 776u 11

12 Package Details Power delivery connector Server Management Components Flip-Chip BGA package with Integrated Heat Spreader Interposer Substrate 12

13 Package Decoupling Vdd (core) Vtt (FSB) PLL filter Power delivery connector 13

14 Front Side Bus P1 P3 CS P2 P4 Interface Support System Topology Termination Voltage Voltage Reference Data Bus Width Data Bus Speed Data Strobes Peak BW Address, Control Speed Glueless 4-way Multi-Processor Dual-sided board, staggered vias 1.2V, common ground with core Ground-referenced, 0.75V Vref 128-bit 400MT/s source synchronous 1 differential strobe for 16b of data 6.4GB/s 200MHz common clock 14

15 Front-Side Bus Topology Previous implementation Four linear stripes This work U-shape Core Core L3 Cache L3 Cache Data I/O Address I/O Control I/O 15

16 Itanium Processor Clock Distribution Trends Attribute Itanium Processor Itanium 2 Processor This work Process/Metal 180nm / Al 180nm / Al 130nm / Cu Primary tree Single ended Differential Differential Local distribution Grid Tree Tree Clock skew [ps] 28ps 62ps 24ps Deskew Method Active On-demand Fuse-based 16

17 Fuse-Based Clock Deskew ITC (TAP) SLCB scan chain SLCB Ck Gen Unit PD SLCB SLCBO zones FUSE Unit SLCB FUSE settings SLCB (3-bit/SLCB) SLCB = Second Level Clock Buffer 17

18 Skew (ps) Clock Zone Skew Plot No deskew Fuse-based deskew Scan-based deskew Clock Zone Worst case clock skew is 24ps in fuse mode and 7ps in scan mode 18

19 Power Same 130W power envelope as the 0.18um Itanium 2 processor 50% frequency increase 2X larger L3 cache Leakage increased 3.5X 5% 14% 7% 74% This work dynamic power I/O power core leakage/static cache leakage Aggressive management of dynamic power Reduced clock loading Reduced contention power L3 cache power management 5% 5% 90% Itanium 2 Processor dynamic power I/O power all leakage/static 19

20 L3D Power Reduction Scheme Previous Implementation This work Index [4:3] Index [4:3] == 11 Bank 1 Request Index [4:3] == 10 Bank 0 Request Index [4:3] == 01 Index [4:3] == Bank 1 Request Bank 0 Request Data in [7:0] Data out [7:0] 2 Data in [7:0] Data out [7:0]

21 Reliability Features Core Pipeline L1I (16KB) L2 (256KB) Data L3 (6MB) Data + Tag L1D (16KB) Tag Bus Unit Error Log Timer ECC protected ITLB1 (32) DTLB1 (32) IPC (128) DTLB2 (128) HW Page Walker Bus Queues Data Poisoning Data Bus Addr Bus Parity protected System bus 21

22 DFT/DFM Feature Summary Feature Itanium Processor Itanium 2 Processor This work Scan Coverage 48K 140K 140K Scanout Coverage 5.5K 24K 24K Cache DAT Mode (major arrays) Yes Yes Yes L3 Redundancy / Repair N/A Dual Quad Weak-Write Test Mode Fixed Fixed Programmable IO DFT Basic IO Loopback Limited IO Loopback Enhanced IO Loopback Dynamic Frequency Adjustment Multi-cycle shrink/stretch Single cycle shrink/stretch Multi-cycle shrink/stretch On-die process monitors No No Yes 22

23 Thermal Protection Features Temperature Thermal Alert ETM Thermal Trip I res I diode ETM Throttle Thermal Trip Time Diode Reference 23

24 Frequency Shmoo PASS FAIL 1:9 Bus Period (ns) Core Frequency 1.80GHz 1.73GHz 1.66GHz 1.61GHz 1.55GHz 1.50GHz 1.45GHz 1.40GHz Frequency increased by 50% from previous generation 24

25 Summary The 2003 version of the Itanium 2 processor (Madison) delivers 2X larger on-die cache and 50% increase in frequency Compatible with today s Itanium 2-based systems Enterprise-class RAS, DFT and DFM features Largest on-die cache and transistor count ever reported for a microprocessor 6.4GB/s multi-drop bus interface Clock de-skew technique achieves 24ps skew in fuse-mode and 7ps in scan mode across entire die 25

New 130nm Itanium 2 Processors for 2003

New 130nm Itanium 2 Processors for 2003 New 130nm Itanium s for 003 Harry Muljono, Stefan usu, Brian Cherkauer, Jason Stinson Intel Corporation, Santa Clara, CA 1 Outline highlights Itanium processor evolution Block diagram Power dissipation

More information

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in

More information

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

Update for New Implementations. As new implementations of the Itanium architecture

Update for New Implementations. As new implementations of the Itanium architecture U Update for New Implementations As new implementations of the Itanium architecture are announced, we attempt to post appropriate updates on the support page for Itanium Architecture for Programmers: Understanding

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

2GB DDR3 SDRAM 72bit SO-DIMM

2GB DDR3 SDRAM 72bit SO-DIMM 2GB 72bit SO-DIMM Speed Max CAS Component Number of Part Number Bandwidth Density Organization Grade Frequency Latency Composition Rank 78.A2GCF.AF10C 10.6GB/sec 1333Mbps 666MHz CL9 2GB 256Mx72 256Mx8

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc. Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor Sanjiv Kapil Gemini Architect Sun Microsystems, Inc. Design Goals Designed for compute-dense, transaction oriented systems (webservers,

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

OPENSPARC T1 OVERVIEW

OPENSPARC T1 OVERVIEW Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

IMM128M72D1SOD8AG (Die Revision F) 1GByte (128M x 72 Bit)

IMM128M72D1SOD8AG (Die Revision F) 1GByte (128M x 72 Bit) Product Specification Rev. 1.0 2015 IMM128M72D1SOD8AG (Die Revision F) 1GByte (128M x 72 Bit) 1GB DDR Unbuffered SO-DIMM RoHS Compliant Product Product Specification 1.0 1 IMM128M72D1SOD8AG Version: Rev.

More information

Real Time Embedded Systems

Real Time Embedded Systems Real Time Embedded Systems " Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours LSN/hepia Prof. HES 1998-2008 2 General classification of electronic memories Non-volatile Memories ROM PROM

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

IMM64M72D1SCS8AG (Die Revision D) 512MByte (64M x 72 Bit)

IMM64M72D1SCS8AG (Die Revision D) 512MByte (64M x 72 Bit) Product Specification Rev. 1.0 2015 IMM64M72D1SCS8AG (Die Revision D) 512MByte (64M x 72 Bit) RoHS Compliant Product Product Specification 1.0 1 IMM64M72D1SCS8AG Version: Rev. 1.0, MAY 2015 1.0 - Initial

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

DDR SDRAM UDIMM. Draft 9/ 9/ MT18VDDT6472A 512MB 1 MT18VDDT12872A 1GB For component data sheets, refer to Micron s Web site:

DDR SDRAM UDIMM. Draft 9/ 9/ MT18VDDT6472A 512MB 1 MT18VDDT12872A 1GB For component data sheets, refer to Micron s Web site: DDR SDRAM UDIMM MT18VDDT6472A 512MB 1 MT18VDDT12872A 1GB For component data sheets, refer to Micron s Web site: www.micron.com 512MB, 1GB (x72, ECC, DR) 184-Pin DDR SDRAM UDIMM Features Features 184-pin,

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

DDR3 Memory Buffer: Buffer at the Heart of the LRDIMM Architecture. Paul Washkewicz Vice President Marketing, Inphi

DDR3 Memory Buffer: Buffer at the Heart of the LRDIMM Architecture. Paul Washkewicz Vice President Marketing, Inphi DDR3 Memory Buffer: Buffer at the Heart of the LRDIMM Architecture Paul Washkewicz Vice President Marketing, Inphi Theme Challenges with Memory Bandwidth Scaling How LRDIMM Addresses this Challenge Under

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 12: Hardware Assisted Software ILP and IA64/Itanium Case Study Lecture Outline Review of Global Scheduling,

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

IMM64M64D1SOD16AG (Die Revision D) 512MByte (64M x 64 Bit)

IMM64M64D1SOD16AG (Die Revision D) 512MByte (64M x 64 Bit) Product Specification Rev. 2.0 2015 IMM64M64D1SOD16AG (Die Revision D) 512MByte (64M x 64 Bit) 512MB DDR Unbuffered SO-DIMM RoHS Compliant Product Product Specification 2.0 1 IMM64M64D1SOD16AG Version:

More information

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS ABSTRACT We describe L1 cache designed for digital signal processor (DSP) core. The cache is 32KB with variable associativity (4 to 16 ways) and is pseudo-dual-ported.

More information

The University of Adelaide, School of Computer Science 13 September 2018

The University of Adelaide, School of Computer Science 13 September 2018 Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

4GB Unbuffered VLP DDR3 SDRAM DIMM with SPD

4GB Unbuffered VLP DDR3 SDRAM DIMM with SPD 4GB Unbuffered VLP DDR3 SDRAM DIMM with SPD Ordering Information Part Number Bandwidth Speed Grade Max Frequency CAS Latency Density Organization Component Composition 78.B1GE3.AFF0C 12.8GB/sec 1600Mbps

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

2GB DDR3 SDRAM SODIMM with SPD

2GB DDR3 SDRAM SODIMM with SPD 2GB DDR3 SDRAM SODIMM with SPD Ordering Information Part Number Bandwidth Speed Grade Max Frequency CAS Latency Density Organization Component Composition Number of Rank 78.A2GC6.AF1 10.6GB/sec 1333Mbps

More information

IMM128M64D1DVD8AG (Die Revision F) 1GByte (128M x 64 Bit)

IMM128M64D1DVD8AG (Die Revision F) 1GByte (128M x 64 Bit) Product Specification Rev. 1.0 2015 IMM128M64D1DVD8AG (Die Revision F) 1GByte (128M x 64 Bit) 1GB DDR VLP Unbuffered DIMM RoHS Compliant Product Product Specification 1.0 1 IMM128M64D1DVD8AG Version: Rev.

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

Organization Row Address Column Address Bank Address Auto Precharge 256Mx4 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10

Organization Row Address Column Address Bank Address Auto Precharge 256Mx4 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10 GENERAL DESCRIPTION The Gigaram GR2DR4BD-E4GBXXXVLP is a 512M bit x 72 DDDR2 SDRAM high density ECC REGISTERED DIMM. The GR2DR4BD-E4GBXXXVLP consists of eighteen CMOS 512M x 4 STACKED DDR2 SDRAMs for 4GB

More information

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5) Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5) 1 Techniques to Reduce Cache Misses Victim caches Better replacement policies pseudo-lru, NRU Prefetching, cache

More information

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor)

Design Objectives of the 0.35µm Alpha Microprocessor (A 500MHz Quad Issue RISC Microprocessor) Design Objectives of the 0.35µm Alpha 21164 Microprocessor (A 500MHz Quad Issue RISC Microprocessor) Gregg Bouchard Digital Semiconductor Digital Equipment Corporation Hudson, MA 1 Outline 0.35µm Alpha

More information

POWER4 Test Chip. Bradley D. McCredie Senior Technical Staff Member IBM Server Group, Austin. August 14, 1999

POWER4 Test Chip. Bradley D. McCredie Senior Technical Staff Member IBM Server Group, Austin. August 14, 1999 Bradley D. McCredie Senior Technical Staff Member Server Group, Austin August 14, 1999 Presentation Overview Design objectives Chip overview Technology Circuits Implementation Results Test Chip Objectives

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Organization Row Address Column Address Bank Address Auto Precharge 128Mx8 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10

Organization Row Address Column Address Bank Address Auto Precharge 128Mx8 (1GB) based module A0-A13 A0-A9 BA0-BA2 A10 GENERAL DESCRIPTION The Gigaram is ECC Registered Dual-Die DIMM with 1.25inch (30.00mm) height based on DDR2 technology. DIMMs are available as ECC modules in 256Mx72 (2GByte) organization and density,

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 24

ECE 571 Advanced Microprocessor-Based Design Lecture 24 ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.

More information

M2U1G64DS8HB1G and M2Y1G64DS8HB1G are unbuffered 200-Pin Double Data Rate (DDR) Synchronous DRAM Unbuffered Dual In-Line

M2U1G64DS8HB1G and M2Y1G64DS8HB1G are unbuffered 200-Pin Double Data Rate (DDR) Synchronous DRAM Unbuffered Dual In-Line 184 pin Based on DDR400/333 512M bit Die B device Features 184 Dual In-Line Memory Module (DIMM) based on 110nm 512M bit die B device Performance: Speed Sort PC2700 PC3200 6K DIMM Latency 25 3 5T Unit

More information

Design of Clock Distribution in High Performance Processors

Design of Clock Distribution in High Performance Processors Design of Clock Distribution in High Performance Processors Ian Young Intel Senior Fellow and Director, Advanced Circuits and Technology Integration (ACTI) Technology Manufacturing Group (TMG) Intel Corporation,

More information

IMME256M64D2SOD8AG (Die Revision E) 2GByte (256M x 64 Bit)

IMME256M64D2SOD8AG (Die Revision E) 2GByte (256M x 64 Bit) Product Specification Rev. 1.0 2015 IMME256M64D2SOD8AG (Die Revision E) 2GByte (256M x 64 Bit) 2GB DDR2 Unbuffered SO-DIMM By ECC DRAM RoHS Compliant Product Product Specification 1.0 1 IMME256M64D2SOD8AG

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

BOBCAT: AMD S LOW-POWER X86 PROCESSOR

BOBCAT: AMD S LOW-POWER X86 PROCESSOR ARCHITECTURES FOR MULTIMEDIA SYSTEMS PROF. CRISTINA SILVANO LOW-POWER X86 20/06/2011 AMD Bobcat Small, Efficient, Low Power x86 core Excellent Performance Synthesizable with smaller number of custom arrays

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed. Lecture 13: SRAM Slides courtesy of Deming Chen Slides based on the initial set from David Harris CMOS VLSI Design Outline Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and

More information

LE4ASS21PEH 16GB Unbuffered 2048Mx64 DDR4 SO-DIMM 1.2V Up to PC CL

LE4ASS21PEH 16GB Unbuffered 2048Mx64 DDR4 SO-DIMM 1.2V Up to PC CL LE4ASS21PEH 16GB Unbuffered 2048Mx64 DDR4 SO-DIMM 1.2V Up to PC4-2133 CL 15-15-15 General Description This Legacy device is a JEDEC standard unbuffered SO-DIMM module, based on CMOS DDR4 SDRAM technology,

More information

256MEGX72 (DDR2-SOCDIMM W/PLL)

256MEGX72 (DDR2-SOCDIMM W/PLL) www.centon.com MEMORY SPECIFICATIONS 256MEGX72 (DDR2-SOCDIMM W/PLL) 128MX8 BASED Lead-Free 268,435,456 words x 72Bit Double Data Rate Memory Module Centon's 2GB DDR2 ECC PLL SODIMM Memory Module is 268,435,456

More information

EEM 486: Computer Architecture. Lecture 9. Memory

EEM 486: Computer Architecture. Lecture 9. Memory EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu

More information

DDR2 SDRAM UDIMM MT8HTF12864AZ 1GB

DDR2 SDRAM UDIMM MT8HTF12864AZ 1GB Features DDR2 SDRAM UDIMM MT8HTF12864AZ 1GB For component data sheets, refer to Micron's Web site: www.micron.com Figure 1: 240-Pin UDIMM (MO-237 R/C D) Features 240-pin, unbuffered dual in-line memory

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

DDR SDRAM UDIMM MT16VDDT6464A 512MB MT16VDDT12864A 1GB MT16VDDT25664A 2GB

DDR SDRAM UDIMM MT16VDDT6464A 512MB MT16VDDT12864A 1GB MT16VDDT25664A 2GB DDR SDRAM UDIMM MT16VDDT6464A 512MB MT16VDDT12864A 1GB MT16VDDT25664A 2GB For component data sheets, refer to Micron s Web site: www.micron.com 512MB, 1GB, 2GB (x64, DR) 184-Pin DDR SDRAM UDIMM Features

More information

1. The values of t RCD and t RP for -335 modules show 18ns to align with industry specifications; actual DDR SDRAM device specifications are 15ns.

1. The values of t RCD and t RP for -335 modules show 18ns to align with industry specifications; actual DDR SDRAM device specifications are 15ns. UDIMM MT4VDDT1664A 128MB MT4VDDT3264A 256MB For component data sheets, refer to Micron s Web site: www.micron.com 128MB, 256MB (x64, SR) 184-Pin UDIMM Features Features 184-pin, unbuffered dual in-line

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance

More information

The Alpha and Microprocessors:

The Alpha and Microprocessors: The Alpha 21364 and 21464 icroprocessors: Continuing the Performance Lead Beyond Y2K Shubu ukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury,

More information

IMME256M64D2DUD8AG (Die Revision E) 2GByte (256M x 64 Bit)

IMME256M64D2DUD8AG (Die Revision E) 2GByte (256M x 64 Bit) Product Specification Rev. 1.0 2015 IMME256M64D2DUD8AG (Die Revision E) 2GByte (256M x 64 Bit) 2GB DDR2 Unbuffered DIMM By ECC DRAM RoHS Compliant Product Product Specification 1.0 1 IMME256M64D2DUD8AG

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

DDR SDRAM SODIMM MT16VDDF6464H 512MB MT16VDDF12864H 1GB

DDR SDRAM SODIMM MT16VDDF6464H 512MB MT16VDDF12864H 1GB SODIMM MT16VDDF6464H 512MB MT16VDDF12864H 1GB 512MB, 1GB (x64, DR) 200-Pin DDR SODIMM Features For component data sheets, refer to Micron s Web site: www.micron.com Features 200-pin, small-outline dual

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

+1 (479)

+1 (479) Memory Courtesy of Dr. Daehyun Lim@WSU, Dr. Harris@HMC, Dr. Shmuel Wimer@BIU and Dr. Choi@PSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Memory Arrays Memory Arrays Random Access Memory Serial

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

ENEE 759H, Spring 2005 Memory Systems: Architecture and

ENEE 759H, Spring 2005 Memory Systems: Architecture and SLIDE, Memory Systems: DRAM Device Circuits and Architecture Credit where credit is due: Slides contain original artwork ( Jacob, Wang 005) Overview Processor Processor System Controller Memory Controller

More information

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division thomas.siebold@hp.com Agenda Terminology What is the Itanium Architecture? 1 Terminology Processor Architectures

More information

IMM64M64D1DVS8AG (Die Revision D) 512MByte (64M x 64 Bit)

IMM64M64D1DVS8AG (Die Revision D) 512MByte (64M x 64 Bit) Product Specification Rev. 1.0 2015 IMM64M64D1DVS8AG (Die Revision D) 512MByte (64M x 64 Bit) 512MB DDR VLP Unbuffered DIMM RoHS Compliant Product Product Specification 1.0 1 IMM64M64D1DVS8AG Version:

More information

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp. Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample

More information

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last

More information

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming

UMBC. Rubini and Corbet, Linux Device Drivers, 2nd Edition, O Reilly. Systems Design and Programming Systems Design and Programming Instructor: Professor Jim Plusquellic Text: Barry B. Brey, The Intel Microprocessors, 8086/8088, 80186/80188, 80286, 80386, 80486, Pentium and Pentium Pro Processor Architecture,

More information

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

Memory Hierarchies 2009 DAT105

Memory Hierarchies 2009 DAT105 Memory Hierarchies Cache performance issues (5.1) Virtual memory (C.4) Cache performance improvement techniques (5.2) Hit-time improvement techniques Miss-rate improvement techniques Miss-penalty improvement

More information

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors) 1 COEN-4730 Computer Architecture Lecture 12 Testing and Design for Testability (focus: processors) Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University 1 Outline Testing

More information

DDR3 Memory for Intel-based G6 Servers

DDR3 Memory for Intel-based G6 Servers DDR3 Memory for Intel-based G6 Servers March 2009 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. New memory technology with G6; DDR-3

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

The Processor That Don't Cost a Thing

The Processor That Don't Cost a Thing The Processor That Don't Cost a Thing Peter Hsu, Ph.D. Peter Hsu Consulting, Inc. http://cs.wisc.edu/~peterhsu DRAM+Processor Commercial demand Heat stiffling industry's growth Heat density limits small

More information

DDR SDRAM UDIMM MT8VDDT3264A 256MB MT8VDDT6464A 512MB For component data sheets, refer to Micron s Web site:

DDR SDRAM UDIMM MT8VDDT3264A 256MB MT8VDDT6464A 512MB For component data sheets, refer to Micron s Web site: DDR SDRAM UDIMM MT8VDDT3264A 256MB MT8VDDT6464A 512MB For component data sheets, refer to Micron s Web site: www.micron.com 256MB, 512MB (x64, SR) 184-Pin DDR SDRAM UDIMM Features Features 184-pin, unbuffered

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

COMPUTER ARCHITECTURES

COMPUTER ARCHITECTURES COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and

More information