Commercially Available Chip Mul3processors for Research. Welcome to the MulE core Era

Size: px
Start display at page:

Download "Commercially Available Chip Mul3processors for Research. Welcome to the MulE core Era"

Transcription

1 4/2/11 ommercially Available hip Mul3processors for Research Bruce hilders University of Pi9sburgh h9p:// AAO h9p:// h9p:// team.org h9p:// Welcome to the MulE core Era hip muleprocessors are everywhere! ellular phone Tablets Netbooks Laptops Desktops Servers 1

2 Welcome to the MulE core Era hip muleprocessors are everywhere! ellular phone Tablets Netbooks Laptops Desktops Servers Qualcomm MSM866 Dual 1.5 GHz Scorpion GPU & cellular modem Up to 10giga operaeons per second App + media + radio operaeon Increasing by 10x every 5 years 1W available (from total) for compueng Ba9ery power determines limits May be single to muleple chips Modem (DMA) GPU (OpenGL) ore (ARM9) ore (ARM9) Processor Welcome to the MulE core Era hip muleprocessors are everywhere! ellular phone Tablets Netbooks Laptops Desktops Servers Intel Sandy Bridge NB 4 cores, 8 HW threads Integrated GPU, M Powerful applicaeons from consumer to science to business Single processor ( socket ) Moving toward high integraeon Moving more toward heterogeneous ore (x86) ore (x86) ore (x86) ore (x86) GPU M (DDR3) Processor 2

3 Welcome to the MulE core Era hip muleprocessors are everywhere! ellular phone Tablets Netbooks Laptops Desktops Servers AMD Opetron cores, 6MB L3 4 sockets, HyperTransport Range of services e.g., cloud compueng VirtualizaEon for server consolidaeon Power consumpeon (effeceve uelizaeon) MulEple cores per processor MulEple processors per machine (node) MulEple machines per cabinet ore ore (x86) ore ore (x86) INT INT (HT 3.1) ore ore M ore ore M (x86) (x86) (DDR3) Processor ( Socket ) Important A9ributes ore Nearby aches (L1, L2) ore Architecture Last Level ache (L3) Memory Power Management InterconnecEon Graphics Processing Uncore Architecture The uncore is what can ma9er for mule core It may also soon be the graphics processing capabiliees 3

4 Intel Processors NetBurst ore Nehalem Nehalem (45nm) Westmere (32nm) Westmere E Sandy Bridge Sandy Bridge (32nm) Ivy Bridge (22nm) first Intel dual core Nehalem Westmere Westmere E Sandy Bridge cores, HT, loosely integrated M/GPU 6 cores, HT, loosely integrated M/GPU, VM Server variant, 1cores, 4 processors (QPI), 2011? 6 cores, new uarch, closely integrated GPU & M Sandy Bridge Desktop, mobile & server variants Features Enhanced core microarchitecture More closely coupled & integrated components Hyper threading with up to 8 cores (16 threads) On chip shared L3 cache Turbo Boost power/speed management Later server versions will feature improved QuickPath Interconnect 4

5 5 Intel Sandy Bridge GPU ache North Bridge PIe x16 Display South Bridge DMI 4 cores with L1, L2, L3 cache Hyper threaded: 8 logical cores Advanced vector extensions (256 bit SIMD) Micro architecture changes (Improved branch predictor, changed register renaming for AVX, 2x load ports) Intel Sandy Bridge GPU ache North Bridge PIe x16 Display South Bridge DMI L1 instruceon cache 32KB L1 I cache Decode 4 x86 instr/cycle onverted to u ops 1.5K entry (L0) u op cache (just caches not trace cache) Gain is power

6 Intel Sandy Bridge GPU 32KB L1 data cache 256KB L2 cache (unified, private) ache DMI North Bridge PIe x16 Display South Bridge Intel Sandy Bridge GPU 8MB L3 cache (shared) Designed for high bandwidth Shared by cores + GPU 435 GB/sec 3.4 GHz* ache DMI North Bridge PIe x16 Display South Bridge * Source: Sandy Bridge Spans Genera5ons, Linley Gwennap, MPR, Sept

7 L3 cache PIe Display System Agent Memory ontroller ore L3 cache (2 MB) ore 1 L3 cache (2 MB) ore 2 L3 cache (2 MB) ore 3 L3 cache (2 MB) Graphics Processing Unit L3 cache PIe Display System Agent Memory ontroller ore L3 cache (2 MB) ore 1 L3 cache (2 MB) ore 2 L3 cache (2 MB) ore 3 L3 cache (2 MB) Graphics Processing Unit 7

8 L3 cache PIe Display ore ore 1 System Agent omposed of 4 rings 32 byte data Display Request Acknowledgement L3 cache (2 MB) Snooping Up to clock traversal Distributed coherence L3 cache (2 MB) ore 2 L3 cache (2 MB) ore 3 L3 cache (2 MB) Graphics Processing Unit Intel Sandy Bridge 1 2 GPU 3 ache Graphics processing unit Integrated on chip More closely coupled with cores (via L3 cache) New FUs & video codec DMI North Bridge PIe x16 Display South Bridge Source: Sandy Bridge Spans Genera5ons, Linley Gwennap, MPR, Sept

9 Intel Sandy Bridge GPU Uncore logic to connect to memory, display and I/O ache DMI North Bridge PIe x16 Display Dual channel memory South Bridge Source: Sandy Bridge Spans Genera5ons, Linley Gwennap, MPR, Sept. 201 Intel Sandy Bridge 1 2 GPU 3 ache Plarorm ontroller Hub (PH) onnects to I/O devices E.g., SATA disk, USB, PI Express, etc DMI North Bridge PIe x16 Display South Bridge Source: Sandy Bridge Spans Genera5ons, Linley Gwennap, MPR, Sept

10 Turbo Boost Power Management Thermal design point (TDP) Maximum power dissipated Baseline: onsider impact of all cores But not all cores are always aceve hange power allocaeon Introduced in Nehalem Shis available budget (under TDP) to boost speed of cores based on workload Feedback Loop Feedback Loop Monitoring Adjust voltage, frequency Boost to remain under TDP May temporarily exceed TDP ore state (AcEve, InacEve) ore ore OS state change ore ore Temperature Power EsEmated current Power Manager Speed setng 1

11 Feedback Loop Baseline frequency ores are aceve/inaceve Frequency with four cores OS state change trigger ore state (AcEve, InacEve) 200 OS state change Temperature Power EsEmated current Power Manager 200 Speed setng Feedback Loop InacEve cores ores moved to inaceve (3/6) Leaves headroom in TDP Spend on other cores OS state change ore state (AcEve, InacEve) Temperature Power EsEmated current Power Manager Boost cores Speed setng 11

12 Feedback Loop Adjust speed upward hange in small steps (10MHz) Up to maximum speed Stay under TDP ore state (AcEve, InacEve) 210 OS state change 210 Temperature Power EsEmated current Power Manager Boost cores Speed setng Feedback Loop Adjust speed upward hange in small steps (10MHz) Up to maximum speed Stay under TDP ore state (AcEve, InacEve) 220 OS state change 220 Temperature Power EsEmated current Power Manager Boost cores Speed setng 12

13 Feedback Loop Adjust speed downward Move back under TDP Temporarily exceeds b/c thermals change slowly ore state (AcEve, InacEve) 230 OS state change 230 Temperature Power EsEmated current Power Manager Reduce cores Speed setng Feedback Loop Adjust speed downward Move back under TDP Temporarily exceeds b/c thermals change slowly ore state (AcEve, InacEve) 220 OS state change 220 Temperature Power EsEmated current Power Manager Reduce cores Speed setng 13

14 Feedback Loop ore i7 2920XM, 4 cores, base 2.5 GHz AcEve ores Max Speed 3.2 GHz 3.3 GHz 3.4 GHz 3.5 GHz ore ore OS state change ore ore Temperature Power EsEmated current Power Manager Speed setng AMD Processors Shanghi (2008) Istanbul (2009) Magny ours (2010) Bulldozer (2011?) November 2008 Shanghi Istanbul Magny ours Bulldozer 4 cores (no HWT), 2.9 GHz, 45nm, 6MB L3, DDR2 6 cores, 2.8 GHz, 45nm, 6MB L3, DDR2, HT assist 12 cores, 2.6 GHz, 45nm, 12MB L3, DDR3, HT assist Tightly coupled cores, separate sched & FUs (HWT like), 16 cores, 32nm, 16 MB L3, 256 bit FPU January

15 AMD Opteron core x86 processor, Istanbul core architecture I/O I/O AMD Opteron core x86 processor, Istanbul core architecture Per package (MulE chip Module) 12 Istanbul cores 2 dies (nodes), 6 core ea, 45nm I/O Per node 6 MB shared L3 Memory controller I/O 2x memory channels 4x HyperTransport links 15

16 AMD Opteron core x86 processor, Istanbul core architecture Remote Local Non uniform memory access Shared address space Physically distributed to nodes I/O Transparently access any address Local address: faster access I/O Remote address: slower (going across the interconnect) HyperTransport Links HyperTransport Point to point interconnect (LVDS) Arranged as muleple links (e.g., x16 links) Up to 25.6 GB/second (x32 links) 4 x16 HT ports/processor allocated for withinpackage communicaeon, cross processor communicaeon & I/O 16

17 InterconnecEon (2 processors) P x16 cht x8 cht P2 4 x16 HyperTransport links x16 adjacent off package nodes x8 diagonal off package nodes x16 + x8 on package nodes x16 noncoherent I/O I/O x16 ncht P1 P3 InterconnecEon (4 processors) P P P P 4 x16 HyperTransport links x8 between off package nodes x16 + x8 on package nodes x16 noncoherent I/O P P P P 17

18 oherence Traffic Explosion in coherence traffic 4 processors, 48 cores! oherence Data may reside in muleple caches Need to keep it consistent Single writer, muleple readers Broadcast Request which core has must recent data learly, doesn t scale well HT Assist X Proc. X 1 Home is locaeon where memory address resides Data can be cached anywhere, though Need to find the locaeon Reader: Deliver poteneally most recent copy Writer: Get exclusive ownership to update data 18

19 HT Assist Proc. 1 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 HT Assist Proc. 2 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 19

20 HT Assist Proc. Yes! 3 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 HT Assist Proc. 4 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 2

21 HT Assist Proc. Proc. X: 1 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 Maintain directory of data locaeon 1MB of L3 dedicated to directory Reduces traffic (locaeon known) HT Assist Proc. Proc. X: 1 1 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Data forwarded from P1 to P2 21

22 HT Assist 2 Proc. Proc. X: 1 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Data forwarded from P1 to P2 HT Assist Proc. Proc. X: 1 3 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Data forwarded from P1 to P2 22

23 HT Assist Proc. Proc. X: 1 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Wait for reply from each processor 4) Data forwarded from P1 to P2 1) P2 requests data from P(home) 2) Pbroadcasts for most recent copy 3) Data forwarded from P1 to P2 Only makes sense for >2 nodes Avoids most broadcasts Reduces L3 cache capacity HT Assist: Where to Keep Directory? 6 MB L3 cache with 16 ways 64 byte line with 16 directory entries DIR DIR Tag State Owner 4 byte directory entry (probe filter) DIR DIR Same processor in 1P and 4P systems Reduce costs by reusing the L3 cache for directory 16 ways, 4 ways dedicated to directory Sparse directory structure Maintain coherence state Modified (owner, dirty) Owned (owner, with sharers) Exclusive (one owner, consistent) Shared (shared, clean/dirty) Invalid (idenefied by lack of entry) Source: Hothips

24 AMD Opteron 610 Model Speed ores AP TDP Price 618SE 2.5 GHz W 14W $1514 * GHz 12 8W 115 W $1265 * GHz 12 8W 115 W $ GHz 12 8W 115 W $ GHz 8 8W 115 W $ GHz 8 8W 115 W $ HE 1.8 GHz W 85 W $ HE 1.7 GHz W 85 W $ HE 2.2 GHz 8 65 W 85 W $ HE 1.8 GHz 8 65 W 85 W $455 SE opemized for performance HE opemized for low power AP average PU power (workload derived power) All have 12 MB L3 (2x 6 MB), HT3, AMD V Introduced March 29, 201 * Introduced February 14, 2011 What s available? AVA Direct Supermicro SuperServer $5087 Quad AMD Opetron core 2.GHz (32) 64 GB memory, 50GB SATA drive Dell PowerEdge R415 $2457 Dual AMD Opetron 4170HE (6), 2.1 GHz (12) 16 GB memory, 25GB SATA drive Dell XPS 830(desktop) $1453 Intel ore i7 260(8MB, 3.4 GHz) 16 GB memory, 1TB SATA drive 24

25 Summary MulE core is certainly here! Significant research challenges Plarorm infrastructure ore architecture ache architecture InterconnecEon Power management IntegraEon and fusing of PU+GPU Today s processors offer many of these capabili3es for research! 25

AMD Opteron 4200 Series Processor

AMD Opteron 4200 Series Processor What s new in the AMD Opteron 4200 Series Processor (Codenamed Valencia ) and the new Bulldozer Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed

More information

COSC 6385 Computer Architecture. - Thread Level Parallelism (III)

COSC 6385 Computer Architecture. - Thread Level Parallelism (III) OS 6385 omputer Architecture - Thread Level Parallelism (III) Spring 2013 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

Philippe Thierry Sr Staff Engineer Intel Corp.

Philippe Thierry Sr Staff Engineer Intel Corp. HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market

More information

Intel Workstation Technology

Intel Workstation Technology Intel Workstation Technology Turning Imagination Into Reality November, 2008 1 Step up your Game Real Workstations Unleash your Potential 2 Yesterday s Super Computer Today s Workstation = = #1 Super Computer

More information

Modern computer architecture. From multicore to petaflops

Modern computer architecture. From multicore to petaflops Modern computer architecture From multicore to petaflops Motivation: Multi-ores where and why Introduction: Moore s law Intel Sandy Brige EP: 2.3 Billion nvidia FERMI: 3 Billion 1965: G. Moore claimed

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

XT Node Architecture

XT Node Architecture XT Node Architecture Let s Review: Dual Core v. Quad Core Core Dual Core 2.6Ghz clock frequency SSE SIMD FPU (2flops/cycle = 5.2GF peak) Cache Hierarchy L1 Dcache/Icache: 64k/core L2 D/I cache: 1M/core

More information

COSC 6385 Computer Architecture. - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors

COSC 6385 Computer Architecture. - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors OS 6385 omputer Architecture - Multi-Processors (IV) Simultaneous multi-threading and multi-core processors Spring 2012 Long-term trend on the number of transistor per integrated circuit Number of transistors

More information

Intel Core i7 Processor

Intel Core i7 Processor Intel Core i7 Processor Vishwas Raja 1, Mr. Danish Ather 2 BSc (Hons.) C.S., CCSIT, TMU, Moradabad 1 Assistant Professor, CCSIT, TMU, Moradabad 2 1 vishwasraja007@gmail.com 2 danishather@gmail.com Abstract--The

More information

MULTI-CORE PROCESSORS: CONCEPTS AND IMPLEMENTATIONS

MULTI-CORE PROCESSORS: CONCEPTS AND IMPLEMENTATIONS MULTI-CORE PROCESSORS: CONCEPTS AND IMPLEMENTATIONS Najem N. Sirhan 1, Sami I. Serhan 2 1 Electrical and Computer Engineering Department, University of New Mexico, Albuquerque, New Mexico, USA 2 Computer

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Core 2 vs I-series. How Far Have We Really Come?

Core 2 vs I-series. How Far Have We Really Come? Core 2 vs I-series How Far Have We Really Come? Appendix 1. Introduction 2. Road map 3. General specifications 4. Hardware subtleties 5. Technology difference 6. Advantages of the new architecture 7. Conclusion

More information

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Today. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

Introducing Sandy Bridge

Introducing Sandy Bridge Introducing Sandy Bridge Bob Valentine Senior Principal Engineer 1 Sandy Bridge - Intel Next Generation Microarchitecture Sandy Bridge: Overview Integrates CPU, Graphics, MC, PCI Express* On Single Chip

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3 MS 411 omputer Systems rchitecture Lecture 21 Multiprocessors 3 Outline Review oherence Write onsistency dministrivia Snooping Building Blocks Snooping protocols and examples oherence traffic and performance

More information

AMD Opteron Processors In the Cloud

AMD Opteron Processors In the Cloud AMD Opteron Processors In the Cloud Pat Patla Vice President Product Marketing AMD DID YOU KNOW? By 2020, every byte of data will pass through the cloud *Source IDC 2 AMD Opteron In The Cloud October,

More information

EPYC VIDEO CUG 2018 MAY 2018

EPYC VIDEO CUG 2018 MAY 2018 AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal

More information

Six-Core AMD Opteron Processor

Six-Core AMD Opteron Processor What s you should know about the Six-Core AMD Opteron Processor (Codenamed Istanbul ) Six-Core AMD Opteron Processor Versatility Six-Core Opteron processors offer an optimal mix of performance, energy

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Scalable Cache Coherent Systems

Scalable Cache Coherent Systems NUM SS Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 TOPICS TODAY Moore s Law Evolution of Intel CPUs IA-32 Basic Execution Environment IA-32 General Purpose Registers

More information

HPC Hardware Overview

HPC Hardware Overview HPC Hardware Overview John Lockman III April 19, 2013 Texas Advanced Computing Center The University of Texas at Austin Outline Lonestar Dell blade-based system InfiniBand ( QDR) Intel Processors Longhorn

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Presented by : Sadegh Riyahi Majid Shokrolahi

Presented by : Sadegh Riyahi Majid Shokrolahi Politecnico di Milano Polo Regionale di Como Architectures for multimedia systems Professor : Cristina Silvano Presented by : Sadegh Riyahi Majid Shokrolahi 29th June 2010 Outline Introduction What is

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP INTRODUCTION or With the exponential increase in computational power of todays hardware, the complexity of the problem

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Lecture 17. NUMA Architecture and Programming

Lecture 17. NUMA Architecture and Programming Lecture 17 NUMA Architecture and Programming Announcements Extended office hours today until 6pm Weds after class? Partitioning and communication in Particle method project 2012 Scott B. Baden /CSE 260/

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Multi-core Programming Evolution

Multi-core Programming Evolution Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution

More information

Copyright 2017 Intel Corporation

Copyright 2017 Intel Corporation Agenda Intel Xeon Scalable Platform Overview Architectural Enhancements 2 Platform Overview 3x16 PCIe* Gen3 2 or 3 Intel UPI 3x16 PCIe Gen3 Capabilities Details 10GbE Skylake-SP CPU OPA DMI Intel C620

More information

LS-DYNA Performance Benchmark and Profiling. April 2015

LS-DYNA Performance Benchmark and Profiling. April 2015 LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Advanced Parallel Programming I

Advanced Parallel Programming I Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:

More information

The Future of Computing: AMD Vision

The Future of Computing: AMD Vision The Future of Computing: AMD Vision Tommy Toles AMD Business Development Executive thomas.toles@amd.com 512-327-5389 Agenda Celebrating Momentum Years of Leadership & Innovation Current Opportunity To

More information

Intel Compiler. Advanced Technical Skills (ATS) North America. IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D.

Intel Compiler. Advanced Technical Skills (ATS) North America. IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. Intel Compiler IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Nehalem-EP CPU Summary Performance/Features: 4 cores 8M on-chip Shared Cache Simultaneous Multi-

More information

MICROPROCESSOR TECHNOLOGY

MICROPROCESSOR TECHNOLOGY MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside

More information

HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure

HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure - CÉCI: What we want - Cluster HMEM - Cluster Lemaitre2 - Comparison - What next? - Support and training - Conclusions CÉCI: What we want CÉCI:

More information

A+ Guide to Managing & Maintaining Your PC, 8th Edition. Chapter 4 All About Motherboards

A+ Guide to Managing & Maintaining Your PC, 8th Edition. Chapter 4 All About Motherboards Chapter 4 All About Motherboards Objectives Learn about the different types and features of motherboards Learn how to use setup BIOS and physical jumpers to configure a motherboard Learn how to maintain

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018 ECE 172 Digital Systems Chapter 15 Turbo Boost Technology Herbert G. Mayer, PSU Status 8/13/2018 1 Syllabus l Introduction l Speedup Parameters l Definitions l Turbo Boost l Turbo Boost, Actual Performance

More information

It's called the Core i7, but we knew it as Nehalem. We go through the entire micro-architecture and explain the new developments from IDF.

It's called the Core i7, but we knew it as Nehalem. We go through the entire micro-architecture and explain the new developments from IDF. Nehalem - Everything You Need to Know about Intel's New Architecture by Anand Lal Shimpi on 11/3/2008 1:00:00 PM Posted in CPUs It's called the Core i7, but we knew it as Nehalem. We go through the entire

More information

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of

More information

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief Meet the Increased Demands on Your Infrastructure with Dell and Intel ServerWatchTM Executive Brief a QuinStreet Excutive Brief. 2012 Doing more with less is the mantra that sums up much of the past decade,

More information

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an

More information

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture

( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture ( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University

Computer Architecture. Memory Hierarchy. Lynn Choi Korea University Computer Architecture Memory Hierarchy Lynn Choi Korea University Memory Hierarchy Motivated by Principles of Locality Speed vs. Size vs. Cost tradeoff Locality principle Temporal Locality: reference to

More information

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain

More information

Platforms Design Challenges with many cores

Platforms Design Challenges with many cores latforms Design hallenges with many cores Raj Yavatkar, Intel Fellow Director, Systems Technology Lab orporate Technology Group 1 Environmental Trends: ell 2 *Other names and brands may be claimed as the

More information

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks Ryzen Agenda What is Ryzen? History Features Zen Architecture SenseMI Technology Master Software Benchmarks The Ryzen Chip What is Ryzen? CPU chip family released by AMD in 2017, which uses their latest

More information

H61MLV Intel Core i7 LGA 1155 Processor. Intel Core i5 LGA 1155 Processor. Intel Core i3 LGA 1155 Processor. Intel Pentium LGA 1155 Processor

H61MLV Intel Core i7 LGA 1155 Processor. Intel Core i5 LGA 1155 Processor. Intel Core i3 LGA 1155 Processor. Intel Pentium LGA 1155 Processor H61MLV3 8.0 Socket LGA 1155 Supported the Intel 3rd and 2nd generation Core i7/ i5/ i3 processors in the 1155 package Supported 2 DIMM of DDR3 1600/1333/1066MHz Supports BIO-Remote 2 Technology Chipset

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

A Case Study in Optimizing GNU Radio s ATSC Flowgraph A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance A Dell Technical White Paper Dell Product Group Armando Acosta and James Pledge THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Managing Data Center Power and Cooling

Managing Data Center Power and Cooling Managing Data Center Power and Cooling with AMD Opteron Processors and AMD PowerNow! Technology Avoiding unnecessary energy use in enterprise data centers can be critical for success. This article discusses

More information

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management

More information

Pactron FPGA Accelerated Computing Solutions

Pactron FPGA Accelerated Computing Solutions Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

POWER YOUR CREATIVITY WITH THE INTEL CORE X-SERIES PROCESSOR FAMILY

POWER YOUR CREATIVITY WITH THE INTEL CORE X-SERIES PROCESSOR FAMILY Product Brief POWER YOUR CREATIVITY WITH THE INTEL CORE X-SERIES PROCESSOR FAMILY The Ultimate Creator PC Platform Made to create, the latest X-series processor family is powered by up to 18 cores and

More information

PC I/O. May 7, Howard Huang 1

PC I/O. May 7, Howard Huang 1 PC I/O Today wraps up the I/O material with a little bit about PC I/O systems. Internal buses like PCI and ISA are critical. External buses like USB and Firewire are becoming more important. Today also

More information

Modern CPU Architectures

Modern CPU Architectures Modern CPU Architectures Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2014 16.04.2014 1 Motivation for Parallelism I CPU History RISC Software GmbH Johannes

More information

HyperTransport Technology

HyperTransport Technology HyperTransport Technology in 2009 and Beyond Mike Uhler VP, Accelerated Computing, AMD President, HyperTransport Consortium February 11, 2009 Agenda AMD Roadmap Update Torrenza, Fusion, Stream Computing

More information

The Next Revolution in Computer Systems Architecture

The Next Revolution in Computer Systems Architecture The Next Revolution in Computer Systems Architecture Richard Oehler Corporate Fellow Office of the CTO University of Mannheim 2/08/07 Computer Systems Architecture Not just the Processor Chip It s all

More information

DART- CUDA: A PGAS RunAme System for MulA- GPU Systems

DART- CUDA: A PGAS RunAme System for MulA- GPU Systems DART- CUDA: A PGAS RunAme System for MulA- GPU Systems Lei Zhou, Karl Fürlinger presented by Ma#hias Maiterth Ludwig- Maximilians- Universität München (LMU) Munich Network Management Team (MNM) InsAtute

More information

CSC501 Operating Systems Principles. OS Structure

CSC501 Operating Systems Principles. OS Structure CSC501 Operating Systems Principles OS Structure 1 Announcements q TA s office hour has changed Q Thursday 1:30pm 3:00pm, MRC-409C Q Or email: awang@ncsu.edu q From department: No audit allowed 2 Last

More information

FAST FORWARD TO YOUR <NEXT> CREATION

FAST FORWARD TO YOUR <NEXT> CREATION FAST FORWARD TO YOUR CREATION THE ULTIMATE PROFESSIONAL WORKSTATIONS POWERED BY INTEL XEON PROCESSORS 7 SEPTEMBER 2017 WHAT S NEW INTRODUCING THE NEW INTEL XEON SCALABLE PROCESSOR BREAKTHROUGH PERFORMANCE

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Copyright 2014 Splunk Inc. Splunk for VMware. Architecture & Design. Michael Donnelly, Sr. Sales Engineer

Copyright 2014 Splunk Inc. Splunk for VMware. Architecture & Design. Michael Donnelly, Sr. Sales Engineer Copyright 2014 Splunk Inc. Splunk for VMware Architecture & Design Michael Donnelly, Sr. Sales Engineer Disclaimer During the course of this presentaeon, we may make forward looking statements regarding

More information

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012 CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations

More information

BREAKING THE MEMORY WALL

BREAKING THE MEMORY WALL BREAKING THE MEMORY WALL CS433 Fall 2015 Dimitrios Skarlatos OUTLINE Introduction Current Trends in Computer Architecture 3D Die Stacking The memory Wall Conclusion INTRODUCTION Ideal Scaling of power

More information

Server Sizing Joe Chang qdpma.com Jchang6 at yahoo

Server Sizing Joe Chang qdpma.com Jchang6 at yahoo Server Sizing 2018 Joe Chang qdpma.com Jchang6 at yahoo About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?)

More information

IP Device Integration Notes

IP Device Integration Notes IP Device Integration Notes Article ID: V1-15-01-20-t Release Date: 01/20/2015 Applied to GV-VMS V14.10 Summary The document consists of three sections: 1. The total frame rate and the number of channels

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels

More information

Vector Engine Processor of SX-Aurora TSUBASA

Vector Engine Processor of SX-Aurora TSUBASA Vector Engine Processor of SX-Aurora TSUBASA Shintaro Momose, Ph.D., NEC Deutschland GmbH 9 th October, 2018 WSSP 1 NEC Corporation 2018 Contents 1) Introduction 2) VE Processor Architecture 3) Performance

More information

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) White Paper First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) Introducing a New Dynamically and Design- Scalable Microarchitecture that Rewrites the Book On Energy Efficiency

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (III)

COSC 6385 Computer Architecture - Thread Level Parallelism (III) OS 6385 omputer Architecture - Thread Level Parallelism (III) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05

More information

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs IO 1 Today IO 2 Key Points CPU interface and interaction with IO IO devices The basic structure of the IO system (north bridge, south bridge, etc.) The key advantages of high speed serial lines. The benefits

More information