Gemini: Sanjiv Kapil. A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. Gemini Architect Sun Microsystems, Inc.
|
|
- Amie Jordan
- 5 years ago
- Views:
Transcription
1 Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor Sanjiv Kapil Gemini Architect Sun Microsystems, Inc.
2 Design Goals Designed for compute-dense, transaction oriented systems (webservers, appservers...) Small form factors (U, 2U, Blades) Low power, low cost Large physical memory (64-bit addressing) Enterprise-class RAS Focus on throughput performance rather than single thread performance Support for - to 4-way SMP with simultaneous execution of 2-8 threads Maintains binary compatibility with SPARC V9 code 2
3 UltraSPARC Processor Roadmap Data Facing s-series UltraSPARC III X Today UltraSPARC IV 2X UltraSPARC V 5X i-series Network Facing UltraSPARC IIiUltraSPARC IIIi 0.5X h-series X.3X Gemini Gemini 3X Niagara 5X
4 Features Dual 4-issue superscalar cores Based on 64-bit SPARC V9 processor architecture 2 integer ops, 2 FPU/graphics (VIS.0) ops/cycle Dual integrated 0.5MB L2 caches (one per core) 4-way set associative Way-locking per core Pin-compatible with UltraSPARC IIIi processor JBus system bus interface Integrated memory interface to standard DRAM Shared 28-bit DDR controller Up to 6 GBs total using 2-4 DIMMS 4
5 Features (con.) Power management modes (E*Star) Transition to either Fcpu/2 or Fcpu/32 Robust RAS support Parity checking on L caches, JBus ECC on L2 caches, DRAM CMT features Software-controlled core disable Dynamic cache-aware core parking Adheres to UltraSPARC CMT Architectural Specification 5
6 Gemini-based System 4-Way SMP/8-thread System.5V DTL C 0 CPU 0 DDR C JBUS 0 DDR C 0 C CPU Repeater C 0 CPU 2 DDR C JBUS DDR C 0 28 C CPU MHz 4.2 GB/s 200 MHz 3.2 GB/s JI O PCI A B JBUS 2 JI O PCI A B 32/64 bit 33/66 MHz 6
7 Technology Dual-core CMT design Transistor count: 80 million Die size: 206 sq. mm. Package: 959 pin ceramic upga Maximum power consumption: 32 GHz,.3V Technology: TI's advanced 30 nm CMOS process, 300 mm wafers 53 nm Leff 7LM Cu, low-k dielectric (OSG) dual Vt 7
8 Block Diagram Dual Core 64Bit Superscalar Processor 8 D i e P h o t D D R Prefetch & Dispatch I$ Grouping Integer Exe Unit Grouping Floating and Floating and Graphics Unit Integer Exe Unit RF Prefetch & Dispatch I$ RF Graphics Unit IBUF IBUF CORE0 DDR CTRL CORE ITLB L2$ Unit ITLB MMU LSU D$ Ld Buf St Buf L2$ UNIT LSU D$ Ld Buf St Buf MMU DTLB JBUS CTRL JBUS CTRL DTLB JBUS Arbiter
9 L2 Caches Unified 52KB L2 cache per core: 4-way set associative, 64B line size, Pseudo-random replacement DP Test 2-2 mode access with 4-cycle latency 8 ECC bits for 64b data with physical bit scrambling, parity-protected tags L2 datapath and control logic integrated with SRAMs Random 28K Tag CT L Way Sel 28K Tag 28K 28K L2Cache Pipeline Diagram L2Cache Area = mm 2 Power = 2.07 Watts 9
10 Memory Controller Specification s - Shared controller for two cores - DDR compliant - Max 4 DIMMs (single-/double-sided) -Total memory 256MB to 6GB - SSTL Input/Output - Memory I/F: 28-bit data + 9 ECC check bits - E*Star support (½ or /32 of Fcpu) -PTB to track 6 open pages GB/s peak B/W - Measured local access latency: 96ns DDR 266 mhz 28+EC C PT B DP AS I DP L2 28 CA M R F DDR Unit DP DP 28+EC Addr/Data C JBus Unit Ct l Squencng Ctl QUEUE Ctl Ct l 0
11 Memory Subsystem Details Specifications Caches: I: 6KB 2-way associative PIPT with 32B line size D: 6KB direct VIPT with 32B line size L2: 52 KB, 4-way set associative 64B line size Pseudo-random replacement PIPT Memory: DDR: 2.5 v (2.625v tolerant) SSTL 28-bit data + 9-bit ECC Peak memory B/W: 4.2GB/@33Mhz 28/256/52/G bit 2-4 Dimm support 64-bit virtual address space Main Features I: Parity protection on tags/data s/w recoverable D: Parity protection on data and tag L2: Parity protection on tags, ECC on data 2-2 mode access 3-ways locking support Flush to memory support Shared for dual cores PTB tracks 6 open pages E*Star modes capable Separate PLL for DDR/E*Star support 4-Way SMP can address 64GB space CMP: Core disable/parking/error reporting JTAG/RAMTEST access to CSR's Non-core/Core-specific error reporting Full 6 GB addressing by each core Cache-aware parking (No instr. Execution)
12 JBus Unit Specifications Jbus Unit IO: JBus :.5 V DTL I/O's 6 byte shared address/data MHz clock Impedence control on DTL I/O's JBUS controller: - Two level arbitration of JBus for dual cores. - Snoop capability for dual L2 cache - E*Star transitions - 28-bit datapaths to dual L2 cache -CMP features support: -Core disable -Core parking -DMA 6 GB addressing support - Error detection for address/data D P CA M R F D P L2 Unit 0 28 Mem Ctl L2 Unit CT CT L Err L Err Snp Fill BU S Arbite r Mhz Snp Fill BU S JJBU S 2
13 Area Distribution Integer FGU DCache ICache LSU ECU MISC Core Area = 28.6 mm % 35.34% 4.07% 5.23%.77% 5.83% CORE L2$ IO MCU JBU MISC CPU Area = 206 mm 2 Core Core 0 52KB JB U MC U 52KB 3
14 Power Distribution In te g e r FGU LS U ECU MIS C Core Core 0.45 Core Power = 6.7 W JBU 0.23% 45.7% 7.2% 4.09% 8.76% CORE L2$ IO MCU JBU DC/RPTR L2Cache MCU L2Cache 3.99% CPU Power = 29.3 W 4
15 UltraSPARC Design Evolution 7.7 mm 2.2 mm 4.4 mm 36,242 trans/mm 2 49 mm 2 388,350 trans/mm 206 mm 2 2 6,508 trans/mm 2 35 mm 2 UltraSPARC I M transistors 500 nm CMOS 4 LM Al 3.3V, 67 MHz SPEC2K int/rate: 76/0.9* *calculated from measured SPEC95 scores UltraSPARC II M transistors 350 nm CMOS 5 LM Al 2.5V, 250 MHz SPEC2K int/rate: 20/.4* Gemini M transistors 30 nm CMOS 7 LM Cu.3V, 200 MHz SPEC2K int/rate: 5/ 4x perf, 8x throughput 5
16 Performance Peak 3.2 GB/s system I/O bandwidth Peak 4.26 GB/s memory bandwidth Average memory latency for local DRAM 96ns CPUs Int_Rate /Watt CPUs SPECjbb2k /Watt CPUs SPECweb99 /Watt CPUs FP_Rate /Watt All GHz with 6 GB memory, 2 active cores/processor All numbers preliminary based on measured data 6
17 Benchmark: SPECint/fp_rate SPECint_rate/ wat t SPECfp_rat e/ watt GM P GM 2P HM P HM 2P Cel P Cel 2P Xn P Xn 2P 0.00 GM P GM 2P HM P HM 2 P Ce l P Ce l 2P Xn P Xn 2P Sources: AMD Opteron Processor Data Sheet Apr. 2003; Intel Celeron Processor Data Sheet (2-2.6 GHz) Jun. 2003; Intel Xeon Processor Data Sheet (2-3.06GHz) Jul
18 Benchmark: SPECweb, SPECjbb 66 SPECweb/ wat t SPECjbb/ watt GM P GM 2P HM P HM 2P Cel P Cel 2P Xn P Xn 2P GM P GM 2P HM P HM 2P Ce l P Ce l 2P Xn P Xn 2P Sources: AMD Opteron Processor Data Sheet Apr. 2003; Intel Celeron Processor Data Sheet (2-2.6 GHz) Jun. 2003; Intel Xeon Processor Data Sheet (2-3.06GHz) Jul
19 Summary First generation CMT design for highdensity, low-cost, network-facing systems Optimized for throughput rather than single-thread performance Maintains binary compatibility with existing SPARC code base Highly leveraged design methodology 9
20 Q & A 20
Reference. T1 Architecture. T1 ( Niagara ) Case Study of a Multi-core, Multithreaded
Reference Case Study of a Multi-core, Multithreaded Processor The Sun T ( Niagara ) Computer Architecture, A Quantitative Approach, Fourth Edition, by John Hennessy and David Patterson, chapter. :/C:8
More informationOPENSPARC T1 OVERVIEW
Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share
More informationNiagara-2: A Highly Threaded Server-on-a-Chip. Greg Grohoski Distinguished Engineer Sun Microsystems
Niagara-2: A Highly Threaded Server-on-a-Chip Greg Grohoski Distinguished Engineer Sun Microsystems August 22, 2006 Authors Jama Barreh Jeff Brooks Robert Golla Greg Grohoski Rick Hetherington Paul Jordan
More informationJim Keller. Digital Equipment Corp. Hudson MA
Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership
More information1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola
1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device
More informationLeveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD
Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last
More informationSPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server
SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server 19 th April 2013 Toshio Yoshida Processor Development Division Enterprise Server Business Unit Fujitsu Limited SPARC64 TM SPARC64 TM
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: Sun Microsystems Sun Fire X4450 Virtualization Platform: VMware ESX 3.5.0 Update 2 (build 110268) Performance Section Performance Tested By: Sun
More informationThe Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights
The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u
More informationIntel Enterprise Processors Technology
Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology
More informationThe Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA
The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2005-4-12 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationA 1.5GHz Third Generation Itanium Processor
A 1.5GHz Third Generation Itanium Processor Jason Stinson, Stefan Rusu Intel Corporation, Santa Clara, CA 1 Outline Processor highlights Process technology details Itanium processor evolution Block diagram
More informationHow much energy can you save with a multicore computer for web applications?
How much energy can you save with a multicore computer for web applications? Peter Strazdins Computer Systems Group, Department of Computer Science, The Australian National University seminar at Green
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationWHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2
WHITE PAPER PERFORMANCE REPORT PRIMERGY BX924 S2 WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX924 S2 This document contains a summary of the benchmarks executed for the PRIMERGY BX924
More informationPOWER6 Processor and Systems
POWER6 Processor and Systems Jim McInnes jimm@ca.ibm.com Compiler Optimization IBM Canada Toronto Software Lab Role I am a Technical leader in the Compiler Optimization Team Focal point to the hardware
More informationHardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.
Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT
More informationSPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers
X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright
More informationPOWER7: IBM's Next Generation Server Processor
POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline
More informationPowerPC 740 and 750
368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order
More informationThe AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA
The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64
More informationPOWER7: IBM's Next Generation Server Processor
Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: IBM BladeCenter LS42 Virtualization Platform: VMware ESX 3.5.0 U3 Build 121023 VMmark V1.1 Score = 16.81 @ 11 Tiles Tested By: IBM Inc., RTP, NC
More informationCS650 Computer Architecture. Lecture 9 Memory Hierarchy - Main Memory
CS65 Computer Architecture Lecture 9 Memory Hierarchy - Main Memory Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 9: Main Memory 9-/ /6/ A. Sohn Memory Cycle Time 5
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationHW Trends and Architectures
Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL585G5 Virtualization Platform: VMware ESX 4.0 (build 148783) Performance Section Performance Tested By: Hewlett Packard Test Date:
More informationJBus Architecture Overview
JBus Architecture Overview Technical Whitepaper Version 1.0, April 2003 This paper provides an overview of the JBus architecture. The Jbus interconnect features a 128-bit packet-switched, split-transaction
More informationOverview of the SPARC Enterprise Servers
Overview of the SPARC Enterprise Servers SPARC Enterprise Technologies for the Datacenter Ideal for Enterprise Application Deployments System Overview Virtualization technologies > Maximize system utilization
More informationEPYC VIDEO CUG 2018 MAY 2018
AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal
More informationMainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation
Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL585G5 Virtualization Platform: VMware ESX 3.5 Update 3 (build 123630) Performance Section Performance Tested By: Hewlett Packard Test
More informationPower Technology For a Smarter Future
2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation
More informationPERFORMANCE MEASUREMENT
Administrivia CMSC 411 Computer Systems Architecture Lecture 3 Performance Measurement and Reliability Homework problems for Unit 1 posted today due next Thursday, 2/12 Start reading Appendix C Basic Pipelining
More informationAdvanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000
More informationHP PA-8000 RISC CPU. A High Performance Out-of-Order Processor
The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA
More informationPortland State University ECE 588/688. Cray-1 and Cray T3E
Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector
More informationItanium 2 Processor Microarchitecture Overview
Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationECE 571 Advanced Microprocessor-Based Design Lecture 22
ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved
More informationVMware VMmark V1.1 Results
Vendor and Hardware Platform: IBM System x3850 M2 Virtualization Platform: VMware ESX 3.5.0 U2 Build 103908 VMware VMmark V1.1 Results Tested By: IBM Inc., RTP, NC Test Date: 2008-08-14 Performance Section
More informationFamily 15h Models 10h-1Fh AMD Athlon Processor Product Data Sheet
Family 15h Models 10h-1Fh AMD Athlon Publication # 52422 Revision: 3.00 Issue Date: July 2012 Advanced Micro Devices 2012 Advanced Micro Devices, Inc. All rights reserved. The contents of this document
More informationComputer Architecture. Introduction. Lynn Choi Korea University
Computer Architecture Introduction Lynn Choi Korea University Class Information Lecturer Prof. Lynn Choi, School of Electrical Eng. Phone: 3290-3249, 공학관 411, lchoi@korea.ac.kr, TA: 윤창현 / 신동욱, 3290-3896,
More informationLecture 14: Multithreading
CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationMIPS R5000 Microprocessor. Technical Backgrounder. 32 kb I-cache and 32 kb D-cache, each 2-way set associative
MIPS R5000 Microprocessor Technical Backgrounder Performance: SPECint95 5.5 SPECfp95 5.5 Instruction Set ISA Compatibility Pipeline Clock System Interface clock Caches TLB Power dissipation: Supply voltage
More informationSGI Challenge Overview
CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 (Case Studies) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived
More informationSix-Core AMD Opteron Processor
What s you should know about the Six-Core AMD Opteron Processor (Codenamed Istanbul ) Six-Core AMD Opteron Processor Versatility Six-Core Opteron processors offer an optimal mix of performance, energy
More informationAdvanced Processor Architecture
Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong
More informationThread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007
Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University 2007 The World has Changed Process Technology Stops Improving! Moore s law but! Transistors don t get faster
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: HP Proliant DL580 G5 Virtualization Platform: VMware ESX 3.5.0 Update 1, build 82663 VMmark V1.1 Score = 14.14 @ 10 Tiles Tested By: Hewlett Packard
More informationThe RM9150 and the Fast Device Bus High Speed Interconnect
The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device
More informationA Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures
A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative
More informationThe Alpha and Microprocessors:
The Alpha 21364 and 21464 icroprocessors: Continuing the Performance Lead Beyond Y2K Shubu ukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury,
More informationUltraSPARC User s Manual
UltraSPARC User s Manual UltraSPARC-I UltraSPARC-II July 1997 901 San Antonio Road Palo Alto, CA 94303 Part No: 802-7220-02 This July 1997-02 Revision is only available online. The only changes made were
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.
More informationMainstream Computer System Components
Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-2800 (DDR3-600) 200 MHz (internal base chip clock) 8-way interleaved
More informationSPARC64 VII Fujitsu s Next Generation Quad-Core Processor
SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High
More informationGoro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America
OOW 2013 The Best Platform for Big Data and Oracle Database 12c Goro Watanabe EVP Fujitsu R&D Center North America Bill King EVP Platform Products Group Fujitsu America, Inc. Overview 1. Fujitsu: Quick
More informationAdvanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University
Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: Inspur NF5280 Virtualization Platform: VMware ESX Server 4.0 build 148592 Performance Section Performance Tested By: Inspur Inc. Configuration Section
More informationComputer System Components
Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware
More informationVMware VMmark V1.1 Results
VMware VMmark V1.1 Results Vendor and Hardware Platform: HP ProLiant DL370 G6 Virtualization Platform: VMware ESX 4.0 build 148783 VMmark V1.1 Score = 23.96 @ 16 Tiles Performance Section Performance Tested
More informationCPUs vs. Memory Hierarchies
CPUs vs. ory Hierarchies eller hur mår Moores lag Erik Hagersten Uppsala Universitet CPU Improvements Moore s Law 2x performance improvm./18 mo (Gordon Moore actually never said this ) Relative Performance
More informationIBM's POWER5 Micro Processor Design and Methodology
IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*
More informationVMware VMmark V1.1.1 Results
VMware VMmark V1.1.1 Results Vendor and Hardware Platform: SGI XE500 Virtualization Platform: VMware ESX 4.0 Update 1 (build 208167) Performance Section Performance Tested By: SGI Test Date: 02/16/10 Configuration
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationMemory latency: Affects cache miss penalty. Measured by:
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory
More informationAgenda. System Performance Scaling of IBM POWER6 TM Based Servers
System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies
More informationPower 7. Dan Christiani Kyle Wieschowski
Power 7 Dan Christiani Kyle Wieschowski History 1980-2000 1980 RISC Prototype 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) 1993 IBM launches 66MHz POWER2 (.35 um) 1997 POWER2 Super
More informationFamily 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet
Family 15h Models 00h-0Fh AMD Opteron Publication # 49687 Revision # 3.01 Issue Date October 2012 Advanced Micro Devices 2011, 2012 Advanced Micro Devices Inc. All rights reserved. The contents of this
More informationPicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor
PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,
More informationAMD Opteron 4200 Series Processor
What s new in the AMD Opteron 4200 Series Processor (Codenamed Valencia ) and the new Bulldozer Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed
More informationCS 152 Computer Architecture and Engineering
CS 152 Computer Architecture and Engineering Lecture 22 Advanced Processors III 2004-11-18 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/
More informationEE414 Embedded Systems Ch 5. Memory Part 2/2
EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers
William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved
More informationLecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationSam Naffziger. Gary Hammond. Next Generation Itanium Processor Overview. Lead Circuit Architect Microprocessor Technology Lab HP Corporation
Next Generation Itanium Processor Overview Gary Hammond Principal Architect Enterprise Platform Group Corporation August 27-30, 2001 Sam Naffziger Lead Circuit Architect Microprocessor Technology Lab HP
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationSunFire range of servers
TAKE IT TO THE NTH Frederic Vecoven Sun Microsystems SunFire range of servers System Components Fireplane Shared Interconnect Operating Environment Ultra SPARC & compilers Applications & Middleware Clustering
More informationSnoop-Based Multiprocessor Design III: Case Studies
Snoop-Based Multiprocessor Design III: Case Studies Todd C. Mowry CS 41 March, Case Studies of Bus-based Machines SGI Challenge, with Powerpath SUN Enterprise, with Gigaplane Take very different positions
More informationECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationAMD s Next Generation Microprocessor Architecture
AMD s Next Generation Microprocessor Architecture Fred Weber October 2001 Goals Build a next-generation system architecture which serves as the foundation for future processor platforms Enable a full line
More informationPowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors
PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to
More informationApplication Performance on Dual Processor Cluster Nodes
Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationA CHIP MULTITHREADED PROCESSOR FOR NETWORK- FACING WORKLOADS
A CHIP MULTITHREADED PROCESSOR FOR NETWORK- FACING WORKLOADS THE ARTICLE DESCRIBES AN EARLY IMPLEMENTATION OF A CHIP MULTITHREADED (CMT) PROCESSOR, SUN MICROSYSTEM S TECHNOLOGY TO SUPPORT ITS THROUGHPUT
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More information