Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.

Size: px
Start display at page:

Download "Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc."

Transcription

1 Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor Sanjiv Kapil Gemini Architect Sun Microsystems, Inc.

2 Design Goals Designed for compute-dense, transaction oriented systems (webservers, appservers...) Small form factors (U, 2U, Blades) Low power, low cost Large physical memory (64-bit addressing) Enterprise-class RAS Focus on throughput performance rather than single thread performance Support for - to 4-way SMP with simultaneous execution of 2-8 threads Maintains binary compatibility with SPARC V9 code 2

3 UltraSPARC Processor Roadmap Data Facing s-series UltraSPARC III X Today UltraSPARC IV 2X UltraSPARC V 5X i-series Network Facing UltraSPARC IIiUltraSPARC IIIi 0.5X h-series X.3X Gemini Gemini 3X Niagara 5X

4 Features Dual 4-issue superscalar cores Based on 64-bit SPARC V9 processor architecture 2 integer ops, 2 FPU/graphics (VIS.0) ops/cycle Dual integrated 0.5MB L2 caches (one per core) 4-way set associative Way-locking per core Pin-compatible with UltraSPARC IIIi processor JBus system bus interface Integrated memory interface to standard DRAM Shared 28-bit DDR controller Up to 6 GBs total using 2-4 DIMMS 4

5 Features (con.) Power management modes (E*Star) Transition to either Fcpu/2 or Fcpu/32 Robust RAS support Parity checking on L caches, JBus ECC on L2 caches, DRAM CMT features Software-controlled core disable Dynamic cache-aware core parking Adheres to UltraSPARC CMT Architectural Specification 5

6 Gemini-based System 4-Way SMP/8-thread System.5V DTL C 0 CPU 0 DDR C JBUS 0 DDR C 0 C CPU Repeater C 0 CPU 2 DDR C JBUS DDR C 0 28 C CPU MHz 4.2 GB/s 200 MHz 3.2 GB/s JI O PCI A B JBUS 2 JI O PCI A B 32/64 bit 33/66 MHz 6

7 Technology Dual-core CMT design Transistor count: 80 million Die size: 206 sq. mm. Package: 959 pin ceramic upga Maximum power consumption: 32 GHz,.3V Technology: TI's advanced 30 nm CMOS process, 300 mm wafers 53 nm Leff 7LM Cu, low-k dielectric (OSG) dual Vt 7

8 Block Diagram Dual Core 64Bit Superscalar Processor 8 D i e P h o t D D R Prefetch & Dispatch I$ Grouping Integer Exe Unit Grouping Floating and Floating and Graphics Unit Integer Exe Unit RF Prefetch & Dispatch I$ RF Graphics Unit IBUF IBUF CORE0 DDR CTRL CORE ITLB L2$ Unit ITLB MMU LSU D$ Ld Buf St Buf L2$ UNIT LSU D$ Ld Buf St Buf MMU DTLB JBUS CTRL JBUS CTRL DTLB JBUS Arbiter

9 L2 Caches Unified 52KB L2 cache per core: 4-way set associative, 64B line size, Pseudo-random replacement DP Test 2-2 mode access with 4-cycle latency 8 ECC bits for 64b data with physical bit scrambling, parity-protected tags L2 datapath and control logic integrated with SRAMs Random 28K Tag CT L Way Sel 28K Tag 28K 28K L2Cache Pipeline Diagram L2Cache Area = mm 2 Power = 2.07 Watts 9

10 Memory Controller Specification s - Shared controller for two cores - DDR compliant - Max 4 DIMMs (single-/double-sided) -Total memory 256MB to 6GB - SSTL Input/Output - Memory I/F: 28-bit data + 9 ECC check bits - E*Star support (½ or /32 of Fcpu) -PTB to track 6 open pages GB/s peak B/W - Measured local access latency: 96ns DDR 266 mhz 28+EC C PT B DP AS I DP L2 28 CA M R F DDR Unit DP DP 28+EC Addr/Data C JBus Unit Ct l Squencng Ctl QUEUE Ctl Ct l 0

11 Memory Subsystem Details Specifications Caches: I: 6KB 2-way associative PIPT with 32B line size D: 6KB direct VIPT with 32B line size L2: 52 KB, 4-way set associative 64B line size Pseudo-random replacement PIPT Memory: DDR: 2.5 v (2.625v tolerant) SSTL 28-bit data + 9-bit ECC Peak memory B/W: 4.2GB/@33Mhz 28/256/52/G bit 2-4 Dimm support 64-bit virtual address space Main Features I: Parity protection on tags/data s/w recoverable D: Parity protection on data and tag L2: Parity protection on tags, ECC on data 2-2 mode access 3-ways locking support Flush to memory support Shared for dual cores PTB tracks 6 open pages E*Star modes capable Separate PLL for DDR/E*Star support 4-Way SMP can address 64GB space CMP: Core disable/parking/error reporting JTAG/RAMTEST access to CSR's Non-core/Core-specific error reporting Full 6 GB addressing by each core Cache-aware parking (No instr. Execution)

12 JBus Unit Specifications Jbus Unit IO: JBus :.5 V DTL I/O's 6 byte shared address/data MHz clock Impedence control on DTL I/O's JBUS controller: - Two level arbitration of JBus for dual cores. - Snoop capability for dual L2 cache - E*Star transitions - 28-bit datapaths to dual L2 cache -CMP features support: -Core disable -Core parking -DMA 6 GB addressing support - Error detection for address/data D P CA M R F D P L2 Unit 0 28 Mem Ctl L2 Unit CT CT L Err L Err Snp Fill BU S Arbite r Mhz Snp Fill BU S JJBU S 2

13 Area Distribution Integer FGU DCache ICache LSU ECU MISC Core Area = 28.6 mm % 35.34% 4.07% 5.23%.77% 5.83% CORE L2$ IO MCU JBU MISC CPU Area = 206 mm 2 Core Core 0 52KB JB U MC U 52KB 3

14 Power Distribution In te g e r FGU LS U ECU MIS C Core Core 0.45 Core Power = 6.7 W JBU 0.23% 45.7% 7.2% 4.09% 8.76% CORE L2$ IO MCU JBU DC/RPTR L2Cache MCU L2Cache 3.99% CPU Power = 29.3 W 4

15 UltraSPARC Design Evolution 7.7 mm 2.2 mm 4.4 mm 36,242 trans/mm 2 49 mm 2 388,350 trans/mm 206 mm 2 2 6,508 trans/mm 2 35 mm 2 UltraSPARC I M transistors 500 nm CMOS 4 LM Al 3.3V, 67 MHz SPEC2K int/rate: 76/0.9* *calculated from measured SPEC95 scores UltraSPARC II M transistors 350 nm CMOS 5 LM Al 2.5V, 250 MHz SPEC2K int/rate: 20/.4* Gemini M transistors 30 nm CMOS 7 LM Cu.3V, 200 MHz SPEC2K int/rate: 5/ 4x perf, 8x throughput 5

16 Performance Peak 3.2 GB/s system I/O bandwidth Peak 4.26 GB/s memory bandwidth Average memory latency for local DRAM 96ns CPUs Int_Rate /Watt CPUs SPECjbb2k /Watt CPUs SPECweb99 /Watt CPUs FP_Rate /Watt All GHz with 6 GB memory, 2 active cores/processor All numbers preliminary based on measured data 6

17 Benchmark: SPECint/fp_rate SPECint_rate/ wat t SPECfp_rat e/ watt GM P GM 2P HM P HM 2P Cel P Cel 2P Xn P Xn 2P 0.00 GM P GM 2P HM P HM 2 P Ce l P Ce l 2P Xn P Xn 2P Sources: AMD Opteron Processor Data Sheet Apr. 2003; Intel Celeron Processor Data Sheet (2-2.6 GHz) Jun. 2003; Intel Xeon Processor Data Sheet (2-3.06GHz) Jul

18 Benchmark: SPECweb, SPECjbb 66 SPECweb/ wat t SPECjbb/ watt GM P GM 2P HM P HM 2P Cel P Cel 2P Xn P Xn 2P GM P GM 2P HM P HM 2P Ce l P Ce l 2P Xn P Xn 2P Sources: AMD Opteron Processor Data Sheet Apr. 2003; Intel Celeron Processor Data Sheet (2-2.6 GHz) Jun. 2003; Intel Xeon Processor Data Sheet (2-3.06GHz) Jul

19 Summary First generation CMT design for highdensity, low-cost, network-facing systems Optimized for throughput rather than single-thread performance Maintains binary compatibility with existing SPARC code base Highly leveraged design methodology 9

20 Q & A 20

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded

Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8

More information

OPENSPARC T1 OVERVIEW

OPENSPARC T1 OVERVIEW Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share

More information

Niagara-2: A Highly Threaded Server-on-a-Chip. Greg Grohoski Distinguished Engineer Sun Microsystems

Niagara-2: A Highly Threaded Server-on-a-Chip. Greg Grohoski Distinguished Engineer Sun Microsystems Niagara-2: A Highly Threaded Server-on-a-Chip Greg Grohoski Distinguished Engineer Sun Microsystems August 22, 2006 Authors Jama Barreh Jeff Brooks Robert Golla Greg Grohoski Rick Hetherington Paul Jordan

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server

SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server 19 th April 2013 Toshio Yoshida Processor Development Division Enterprise Server Business Unit Fujitsu Limited SPARC64 TM SPARC64 TM

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: Sun Microsystems Sun Fire X4450 Virtualization Platform: VMware ESX 3.5.0 Update 2 (build 110268) Performance Section Performance Tested By: Sun

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

A 1.5GHz Third Generation Itanium Processor

A 1.5GHz Third Generation Itanium Processor A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram

More information

How much energy can you save with a multicore computer for web applications?

How much energy can you save with a multicore computer for web applications? How much energy can you save with a multicore computer for web applications? Peter Strazdins Computer Systems Group, Department of Computer Science, The Australian National University seminar at Green

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2 WHITE PAPER PERFORMANCE REPORT PRIMERGY BX924 S2 WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2 This document contains a summary of the benchmarks executed for the PRIMERGY BX924

More information

POWER6 Processor and Systems

POWER6 Processor and Systems POWER6 Processor and Systems Jim McInnes jimm@ca.ibm.com Compiler Optimization IBM Canada Toronto Software Lab Role I am a Technical leader in the Compiler Optimization Team Focal point to the hardware

More information

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT

More information

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

PowerPC 740 and 750

PowerPC 740 and 750 368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order

More information

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM BladeCenter LS42 Virtualization Platform: VMware ESX 3.5.0 U3 Build 121023 VMmark V1.1 Score = 16.81 @ 11 Tiles Tested By: IBM Inc., RTP, NC

More information

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory

CS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL585G5 Virtualization Platform: VMware ESX 4.0 (build 148783) Performance Section Performance Tested By: Hewlett Packard Test Date:

More information

JBus Architecture Overview

JBus Architecture Overview JBus Architecture Overview Technical Whitepaper Version 1.0, April 2003 This paper provides an overview of the JBus architecture. The Jbus interconnect features a 128-bit packet-switched, split-transaction

More information

Overview of the SPARC Enterprise Servers

Overview of the SPARC Enterprise Servers Overview of the SPARC Enterprise Servers SPARC Enterprise Technologies for the Datacenter Ideal for Enterprise Application Deployments System Overview Virtualization technologies > Maximize system utilization

More information

EPYC VIDEO CUG 2018 MAY 2018

EPYC VIDEO CUG 2018 MAY 2018 AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL585G5 Virtualization Platform: VMware ESX 3.5 Update 3 (build 123630) Performance Section Performance Tested By: Hewlett Packard Test

More information

Power Technology For a Smarter Future

Power Technology For a Smarter Future 2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation

More information

PERFORMANCE MEASUREMENT

PERFORMANCE MEASUREMENT Administrivia CMSC 411 Computer Systems Architecture Lecture 3 Performance Measurement and Reliability Homework problems for Unit 1 posted today due next Thursday, 2/12 Start reading Appendix C Basic Pipelining

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 22

ECE 571 Advanced Microprocessor-Based Design Lecture 22 ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM System x3850 M2 Virtualization Platform: VMware ESX 3.5.0 U2 Build 103908 VMware VMmark V1.1 Results Tested By: IBM Inc., RTP, NC Test Date: 2008-08-14 Performance Section

More information

Family 15h Models 10h-1Fh AMD Athlon Processor Product Data Sheet

Family 15h Models 10h-1Fh AMD Athlon Processor Product Data Sheet Family 15h Models 10h-1Fh AMD Athlon Publication # 52422 Revision: 3.00 Issue Date: July 2012 Advanced Micro Devices 2012 Advanced Micro Devices, Inc. All rights reserved. The contents of this document

More information

Computer Architecture. Introduction. Lynn Choi Korea University

Computer Architecture. Introduction. Lynn Choi Korea University Computer Architecture Introduction Lynn Choi Korea University Class Information Lecturer Prof. Lynn Choi, School of Electrical Eng. Phone: 3290-3249, 공학관 411, lchoi@korea.ac.kr, TA: 윤창현 / 신동욱, 3290-3896,

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

MIPS R5000 Microprocessor. Technical Backgrounder. 32 kb I-cache and 32 kb D-cache, each 2-way set associative

MIPS R5000 Microprocessor. Technical Backgrounder. 32 kb I-cache and 32 kb D-cache, each 2-way set associative MIPS R5000 Microprocessor Technical Backgrounder Performance: SPECint95 5.5 SPECfp95 5.5 Instruction Set ISA Compatibility Pipeline Clock System Interface clock Caches TLB Power dissipation: Supply voltage

More information

SGI Challenge Overview

SGI Challenge Overview CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 (Case Studies) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived

More information

Six-Core AMD Opteron Processor

Six-Core AMD Opteron Processor What s you should know about the Six-Core AMD Opteron Processor (Codenamed Istanbul ) Six-Core AMD Opteron Processor Versatility Six-Core Opteron processors offer an optimal mix of performance, energy

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

Thread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007

Thread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007 Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University 2007 The World has Changed Process Technology Stops Improving! Moore s law but! Transistors don t get faster

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: HP Proliant DL580 G5 Virtualization Platform: VMware ESX 3.5.0 Update 1, build 82663 VMmark V1.1 Score = 14.14 @ 10 Tiles Tested By: Hewlett Packard

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative

More information

The Alpha and Microprocessors:

The Alpha and Microprocessors: The Alpha 21364 and 21464 icroprocessors: Continuing the Performance Lead Beyond Y2K Shubu ukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury,

More information

UltraSPARC User s Manual

UltraSPARC User s Manual UltraSPARC User s Manual UltraSPARC-I UltraSPARC-II July 1997 901 San Antonio Road Palo Alto, CA 94303 Part No: 802-7220-02 This July 1997-02 Revision is only available online. The only changes made were

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-2800 (DDR3-600) 200 MHz (internal base chip clock) 8-way interleaved

More information

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High

More information

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America OOW 2013 The Best Platform for Big Data and Oracle Database 12c Goro Watanabe EVP Fujitsu R&D Center North America Bill King EVP Platform Products Group Fujitsu America, Inc. Overview 1. Fujitsu: Quick

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: Inspur NF5280 Virtualization Platform: VMware ESX Server 4.0 build 148592 Performance Section Performance Tested By: Inspur Inc. Configuration Section

More information

Computer System Components

Computer System Components Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware

More information

VMware VMmark V1.1 Results

VMware VMmark V1.1 Results VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL370 G6 Virtualization Platform: VMware ESX 4.0 build 148783 VMmark V1.1 Score = 23.96 @ 16 Tiles Performance Section Performance Tested

More information

CPUs vs. Memory Hierarchies

CPUs vs. Memory Hierarchies CPUs vs. ory Hierarchies eller hur mår Moores lag Erik Hagersten Uppsala Universitet CPU Improvements Moore s Law 2x performance improvm./18 mo (Gordon Moore actually never said this ) Relative Performance

More information

IBM's POWER5 Micro Processor Design and Methodology

IBM's POWER5 Micro Processor Design and Methodology IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*

More information

VMware VMmark V1.1.1 Results

VMware VMmark V1.1.1 Results VMware VMmark V1.1.1 Results Vendor and Hardware Platform: SGI XE500 Virtualization Platform: VMware ESX 4.0 Update 1 (build 208167) Performance Section Performance Tested By: SGI Test Date: 02/16/10 Configuration

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

Power 7. Dan Christiani Kyle Wieschowski

Power 7. Dan Christiani Kyle Wieschowski Power 7 Dan Christiani Kyle Wieschowski History 1980-2000 1980 RISC Prototype 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) 1993 IBM launches 66MHz POWER2 (.35 um) 1997 POWER2 Super

More information

Family 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet

Family 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet Family 15h Models 00h-0Fh AMD Opteron Publication # 49687 Revision # 3.01 Issue Date October 2012 Advanced Micro Devices 2011, 2012 Advanced Micro Devices Inc. All rights reserved. The contents of this

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

AMD Opteron 4200 Series Processor

AMD Opteron 4200 Series Processor What s new in the AMD Opteron 4200 Series Processor (Codenamed Valencia ) and the new Bulldozer Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2004-11-18 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation

Sam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

SunFire range of servers

SunFire range of servers TAKE IT TO THE NTH Frederic Vecoven Sun Microsystems SunFire range of servers System Components Fireplane Shared Interconnect Operating Environment Ultra SPARC & compilers Applications & Middleware Clustering

More information

Snoop-Based Multiprocessor Design III: Case Studies

Snoop-Based Multiprocessor Design III: Case Studies Snoop-Based Multiprocessor Design III: Case Studies Todd C. Mowry CS 41 March, Case Studies of Bus-based Machines SGI Challenge, with Powerpath SUN Enterprise, with Gigaplane Take very different positions

More information

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]

ECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)] ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

AMD s Next Generation Microprocessor Architecture

AMD s Next Generation Microprocessor Architecture AMD s Next Generation Microprocessor Architecture Fred Weber October 2001 Goals Build a next-generation system architecture which serves as the foundation for future processor platforms Enable a full line

More information

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

A CHIP MULTITHREADED PROCESSOR FOR NETWORK- FACING WORKLOADS

A CHIP MULTITHREADED PROCESSOR FOR NETWORK- FACING WORKLOADS A CHIP MULTITHREADED PROCESSOR FOR NETWORK- FACING WORKLOADS THE ARTICLE DESCRIBES AN EARLY IMPLEMENTATION OF A CHIP MULTITHREADED (CMT) PROCESSOR, SUN MICROSYSTEM S TECHNOLOGY TO SUPPORT ITS THROUGHPUT

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information