DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS

Size: px
Start display at page:

Download "DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS"

Transcription

1 th August DIAMOND RINGS ACKNOWLEDGED EVENT PROPAGATION IN MANY-CORE PROCESSORS Stefan Nürnberger, Randolf Rotta, Gabor Drescher, Daniel Danner, Jörg Nolte

2 ACKNOWLEDGED EVENT PROPAGATION What does it do? Make events observable in a networked system Make sure events are globally observable Enforce ordering of events What is it good for? Memory Consistency Coherence Protocols Atomic Operations How to implement it? Just use broadcast with acknowledgement... Motivation

3 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir $ C $ $ $ $ C C C... Cn Motivation

4 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ C $ $ $ $ C C C... Cn Motivation

5 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x C $ $ $ $ C C C... Cn Motivation

6 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x C $ $ $ $ C C C... Cn Motivation

7 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x C $ x $ $ $ C C C... Cn Motivation

8 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x $ x $ x $ $ C C C C... Cn Motivation

9 EXAMPLE: READ FOR OWNERSHIP x Memory $Dir read (x) $ x $ x $ x $ x $ C C C C... Cn Motivation

10 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir $ x $ x $ x $ x $ C C C C... Cn Motivation

11 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir invalidate (x) $ x $ x $ x $ x $ C C C C... Cn Motivation

12 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir invalidate (x) $ x $ x $ x $ x $ C C C C... Cn Motivation

13 rfo (x) EXAMPLE: READ FOR OWNERSHIP x Memory $Dir $ x $ x $ x $ x $ x C C C C... Cn Motivation

14 OUTLINE. & of Broadcast. The Diamond Ring Topology. Evaluation & of Broadcast

15 THROUGHPUT & LATENCY time from sending out message to reception of acknowledgement determined by longest path (#hops + processing at each node) lower is better number of messages processed within fixed time span determined by node with maximum overhead (i.e. bottleneck) requires pipelining of messages (latency hiding) higher is better & of Broadcast

16 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

17 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

18 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

19 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

20 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

21 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

22 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

23 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

24 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

25 ACKNOWLEDGED BROADCAST USING BALANCED TREES & of Broadcast

26 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

27 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

28 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

29 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

30 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

31 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

32 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

33 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

34 ACKNOWLEDGED BROADCAST USING SKEWED TREES & of Broadcast

35 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

36 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

37 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

38 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

39 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

40 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

41 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

42 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

43 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

44 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

45 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

46 ACKNOWLEDGED BROADCAST USING RINGS & of Broadcast

47 FORWARD PROCESS ACK Message Forwarding as Acknowledgement possible in ring structures halve number of sent messages (network contention) may increase latency (processing time at node) Ring Structure. Receive Message. Process Message. Forward Message (Ack) Tree Structure. Receive Message. Forward Message (except leaves). Process Message. Receive Ack (except leaves). Forward Ack Not an issue if only message reception needs acknowledgement. & of Broadcast

48 OUTLINE. & of Broadcast. The Diamond Ring Topology. Evaluation The Diamond Ring Topology

49 THE DIAMOND RING TOPOLOGY Combine Ring and Balanced Tree Logarithmic path length for low latency Forwarding is acknowledgement Parallel message propagation Computable topology Diamond Ring: Directed Graph D l k k Arity of tree nodes l Levels of tree scattering Based on a balanced tree B l k Mirrored at the leaves Closed to ring at the root D l k = (k+)kl (k+) k D l+ k = D l k +kl +k l+ The Diamond Ring Topology

50 THE DIAMOND RING TOPOLOGY Combine Ring and Balanced Tree Logarithmic path length for low latency Forwarding is acknowledgement Parallel message propagation Computable topology Diamond Ring: Directed Graph D l k k Arity of tree nodes l Levels of tree scattering Based on a balanced tree B l k Mirrored at the leaves Closed to ring at the root D l k = (k+)kl (k+) k D l+ k = D l k +kl +k l+ The Diamond Ring Topology

51 THE PERFECT DIAMOND RING D - diamond ring with nodes The Diamond Ring Topology

52 THE PERFECT DIAMOND RING D - diamond ring with nodes root scatter center gather root The Diamond Ring Topology

53 THE PERFECT DIAMOND RING D - diamond ring with nodes + (no bottleneck version) The Diamond Ring Topology

54 THE PERFECT DIAMOND RING D - diamond ring with nodes + (no bottleneck version) root scatter center gather The Diamond Ring Topology

55 SOME MORE EXAMPLES D - diamond ring with nodes The Diamond Ring Topology

56 SOME MORE EXAMPLES D - diamond ring with nodes The Diamond Ring Topology

57 SOME MORE EXAMPLES D - diamond ring with nodes The Diamond Ring Topology

58 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

59 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

60 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

61 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

62 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

63 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

64 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

65 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

66 ACKNOWLEDGED BROADCAST USING DIAMOND RINGS The Diamond Ring Topology

67 DEALING WITH ODD NODE COUNTS D - diamond ring with nodes (- nodes) The Diamond Ring Topology

68 DEALING WITH ODD NODE COUNTS D - diamond ring with nodes (- nodes) root scatter center gather root The Diamond Ring Topology

69 DEALING WITH ODD NODE COUNTS () D - diamond ring with nodes (+ nodes) The Diamond Ring Topology

70 DEALING WITH ODD NODE COUNTS () D - diamond ring with nodes (+ nodes) root scatter center gather root The Diamond Ring Topology

71 COMPARISON TO BALANCED TREES and is reduced due to shorter longest path is increased since nodes have less communication partners Contention on the network is reduced due to less messages sent Balanced Tree Diamond Ring Ring Longest Path log k (n) log k (n) n Max. Overhead (k + ) k Messages sent (n ) k k+ n n The Diamond Ring Topology

72 COMPARISON TO BALANCED TREES and is reduced due to shorter longest path is increased since nodes have less communication partners Contention on the network is reduced due to less messages sent Balanced Tree Diamond Ring Ring Longest Path log k (n) log k (n) n Max. Overhead (k + ) k + Messages sent (n ) k k+ n+ n The Diamond Ring Topology

73 OUTLINE. & of Broadcast. The Diamond Ring Topology. Evaluation Evaluation

74 EVALUATION OF DIAMOND RINGS Hypothesis Acknowledged broadcasts using diamond rings should have.... lower latency,. higher throughput... than balanced trees. Benchmark Setup Custom active message framework Messages in shared memory Topologies: Balanced Tree (BT), Diamond Ring (DR), Sequenced Diamond Ring (SDR) Three different evaluation platforms Evaluation

75 EVALUATION PLATFORMS EZ-Chip Tilera TILE-Gx (in-order) Low- Mesh Network (UDN) Intel Xeon E v Sockets, Cores, (out-of-order) Slotted Rings, QPI between Sockets Intel Xeon Phi P Cores, (in-order) Slotted Ring Network Evaluation

76 EZ-CHIP TILERA TILE-GX median latency [µs] arity= arity= arity= number of cores BT DR SDR median events per µs..... arity= arity= arity= number of pipelined broadcasts BT DR SDR Evaluation

77 INTEL XEON V median latency [µs]..... arity= arity= arity= number of hardware threads BT DR SDR median events per µs.... arity= arity= arity= number of pipelined broadcasts BT DR SDR Evaluation

78 INTEL XEON PHI P median latency [µs] arity= arity= arity= number of hardware threads BT DR SDR median events per µs arity= arity= arity= number of pipelined broadcasts BT DR SDR Evaluation

79 RESULTS OVERVIEW median latency [µs] TILE Gx ( nodes) Xeon E v ( nodes) XeonPhi P ( nodes) BT DR max median throughput [broadcasts per µs] SDR Evaluation

80 RESULTS OVERVIEW median latency [µs].... TILE Gx ( nodes) Xeon E v ( nodes) XeonPhi P ( nodes) max median throughput [broadcasts per µs].... BT DR SDR Evaluation

81 SUMMARY Acknowledged Event Propagation is very important in consistency management. and require a trade-off. Diamond Rings offer a better trade-off than balanced trees. are acknowledged broadcast s best friend. Thank you for your attention! Questions? This work was supported by the German Research Foundation (DFG) under grant no. NO /- and SCHR /- The End

Diamond Rings: Acknowledged Event Propagation in Many-Core Processors

Diamond Rings: Acknowledged Event Propagation in Many-Core Processors Diamond Rings: Acknowledged Event Propagation in Many-Core Processors Stefan Nürnberger, Randolf Rotta, Gabor Drescher, Daniel Danner, and Jörg Nolte Brandenburg University of Technology, Cottbus-Senftenberg,

More information

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Intel K. K. E-mail: hirokazu.kobayashi@intel.com Yoshifumi Nakamura RIKEN AICS E-mail: nakamura@riken.jp Shinji Takeda

More information

Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System Center for Information ervices and High Performance Computing (ZIH) Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor ystem Parallel Architectures and Compiler Technologies

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Down selecting suitable manycore technologies for the ELT AO RTC. David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz

Down selecting suitable manycore technologies for the ELT AO RTC. David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz Down selecting suitable manycore technologies for the ELT AO RTC David Barr, Alastair Basden, Nigel Dipper and Noah Schwartz GFLOPS RTC for AO workshop 27/01/2016 AO RTC Complexity 1.E+05 1.E+04 E-ELT

More information

Evaluating On-Node GPU Interconnects for Deep Learning Workloads

Evaluating On-Node GPU Interconnects for Deep Learning Workloads Evaluating On-Node GPU Interconnects for Deep Learning Workloads NATHAN TALLENT, NITIN GAWANDE, CHARLES SIEGEL ABHINAV VISHNU, ADOLFY HOISIE Pacific Northwest National Lab PMBS 217 (@ SC) November 13,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval

A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval A Combined Semi-Pipelined Query Processing Architecture For Distributed Full-Text Retrieval Simon Jonassen and Svein Erik Bratsberg Department of Computer and Information Science Norwegian University of

More information

Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream Processing

Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream Processing Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream Processing Ivan Walulya, Yiannis Nikolakopoulos, Vincenzo Gulisano Marina Papatriantafilou and Philippas Tsigas Auto-DaSP 2017 Chalmers

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and

More information

Special Course on Computer Architecture

Special Course on Computer Architecture Special Course on Computer Architecture #9 Simulation of Multi-Processors Hiroki Matsutani and Hideharu Amano Outline: Simulation of Multi-Processors Background [10min] Recent multi-core and many-core

More information

Non-uniform memory access (NUMA)

Non-uniform memory access (NUMA) Non-uniform memory access (NUMA) Memory access between processor core to main memory is not uniform. Memory resides in separate regions called NUMA domains. For highest performance, cores should only access

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core

More information

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Advanced Parallel Programming I

Advanced Parallel Programming I Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University

More information

MPI Performance Analysis and Optimization on Tile64/Maestro

MPI Performance Analysis and Optimization on Tile64/Maestro MPI Performance Analysis and Optimization on Tile64/Maestro Mikyung Kang, Eunhui Park, Minkyoung Cho, Jinwoo Suh, Dong-In Kang, and Stephen P. Crago USC/ISI-East July 19~23, 2009 Overview Background MPI

More information

Performance study example ( 5.3) Performance study example

Performance study example ( 5.3) Performance study example erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Lecture 10: Cache Coherence. Parallel Computer Architecture and Programming CMU / 清华 大学, Summer 2017

Lecture 10: Cache Coherence. Parallel Computer Architecture and Programming CMU / 清华 大学, Summer 2017 Lecture 10: Cache Coherence Parallel Computer Architecture and Programming CMU / 清华 大学, Summer 2017 Course schedule (where we are) Week 1: How parallel hardware works: types of parallel execution in modern

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

Vorlesung Kommunikationsnetze Research Topics: QoS in VANETs

Vorlesung Kommunikationsnetze Research Topics: QoS in VANETs Vorlesung Kommunikationsnetze Research Topics: QoS in VANETs Prof. Dr. H. P. Großmann mit B. Wiegel sowie A. Schmeiser und M. Rabel Sommersemester 2009 Institut für Organisation und Management von Informationssystemen

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

Building blocks for high performance DWH Computing

Building blocks for high performance DWH Computing Building blocks for high performance DWH Computing Wolfgang Höfer, Nuremberg, 18 st November 2010 Copyright 2010 Fujitsu Technology Solutions Current trends (1) Intel/AMD CPU performance is growing fast

More information

Fault-adaptive routing

Fault-adaptive routing Fault-adaptive routing Presenter: Zaheer Ahmed Supervisor: Adan Kohler Reviewers: Prof. Dr. M. Radetzki Prof. Dr. H.-J. Wunderlich Date: 30-June-2008 7/2/2009 Agenda Motivation Fundamentals of Routing

More information

The Impact of Optics on HPC System Interconnects

The Impact of Optics on HPC System Interconnects The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL SABELA RAMOS, TORSTEN HOEFLER Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL spcl.inf.ethz.ch Microarchitectures are becoming more and more complex CPU L1 CPU L1 CPU L1 CPU

More information

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen

Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit

More information

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters Exploiting InfiniBand and Direct Technology for High Performance Collectives on Clusters Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth

More information

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance A Dell Technical White Paper Dell Product Group Armando Acosta and James Pledge THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Mesh Networks

Mesh Networks Institute of Computer Science Department of Distributed Systems Prof. Dr.-Ing. P. Tran-Gia Decentralized Bandwidth Management in IEEE 802.16 Mesh Networks www3.informatik.uni-wuerzburg.de Motivation IEEE

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing

More information

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.

More information

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium

More information

Novel Hardware Architecture for Fast Address Lookups

Novel Hardware Architecture for Fast Address Lookups Novel Hardware Architecture for Fast Address Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina State University {pmehrot,paulf}@eos.ncsu.edu This

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT Konstantinos Alexopoulos ECE NTUA CSLab MOTIVATION HPC, Multi-node & Heterogeneous Systems Communication with low latency

More information

Using Time Division Multiplexing to support Real-time Networking on Ethernet

Using Time Division Multiplexing to support Real-time Networking on Ethernet Using Time Division Multiplexing to support Real-time Networking on Ethernet Hariprasad Sampathkumar 25 th January 2005 Master s Thesis Defense Committee Dr. Douglas Niehaus, Chair Dr. Jeremiah James,

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. CPU performance keeps increasing 26 72-core Xeon

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Interconnection Networks

Interconnection Networks Interconnection Networks Interconnection Networks Introduction How to connect individual devices together into a group of communicating devices? Device: r r r Component within a computer Single computer

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question: Database Workload + Low throughput (0.8 IPC on an 8-wide superscalar. 1/4 of SPEC) + Naturally threaded (and widely used) application - Already high cache miss rates on a single-threaded machine (destructive

More information

EECS 570 Final Exam - SOLUTIONS Winter 2015

EECS 570 Final Exam - SOLUTIONS Winter 2015 EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32

More information

MULTIPROCESSOR OS. Overview. COMP9242 Advanced Operating Systems S2/2013 Week 11: Multiprocessor OS. Multiprocessor OS

MULTIPROCESSOR OS. Overview. COMP9242 Advanced Operating Systems S2/2013 Week 11: Multiprocessor OS. Multiprocessor OS Overview COMP9242 Advanced Operating Systems S2/2013 Week 11: Multiprocessor OS Multiprocessor OS Scalability Multiprocessor Hardware Contemporary systems Experimental and Future systems OS design for

More information

Telematics. 5th Tutorial - LLC vs. MAC, HDLC, Flow Control, E2E-Arguments

Telematics. 5th Tutorial - LLC vs. MAC, HDLC, Flow Control, E2E-Arguments 19531 - Telematics 5th Tutorial - LLC vs. MAC, HDLC, Flow Control, E2E-Arguments Bastian Blywis Department of Mathematics and Computer Science Institute of Computer Science 18. November, 2010 Institute

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism

Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism 1 Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism Takeshi Nanri (Kyushu Univ. and JST CREST, Japan) 16 Aug, 2016 4th Annual MVAPICH Users Group Meeting 2 Background

More information

xsim The Extreme-Scale Simulator

xsim The Extreme-Scale Simulator www.bsc.es xsim The Extreme-Scale Simulator Janko Strassburg Severo Ochoa Seminar @ BSC, 28 Feb 2014 Motivation Future exascale systems are predicted to have hundreds of thousands of nodes, thousands of

More information

Chapter 6. Parallel Processors from Client to Cloud Part 2 COMPUTER ORGANIZATION AND DESIGN. Homogeneous & Heterogeneous Multicore Architectures

Chapter 6. Parallel Processors from Client to Cloud Part 2 COMPUTER ORGANIZATION AND DESIGN. Homogeneous & Heterogeneous Multicore Architectures COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Part 2 Homogeneous & Heterogeneous Multicore Architectures Intel XEON 22nm

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept of Electrical & Computer Engineering Computer Architecture ECE 568 art 5 Input/Output Israel Koren ECE568/Koren art5 CU performance keeps increasing 26 72-core Xeon hi

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) 1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory

More information

NETWORK PROBLEM SET Solution

NETWORK PROBLEM SET Solution NETWORK PROBLEM SET Solution Problem 1 Consider a packet-switched network of N nodes connected by the following topologies: 1. For a packet-switched network of N nodes, the number of hops is one less than

More information

HW and SW Architectures for Over-The-Air Dynamic Reconfiguration by Software Download

HW and SW Architectures for Over-The-Air Dynamic Reconfiguration by Software Download Information Technology Center Europe Telecommunications Laboratory HW and SW Architectures for Over-The-Air Dynamic Reconfiguration by Software Download a proof of concept by lab experimentation Christophe

More information

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved About the Speaker Hemant Gaidhani Senior Technical

More information

Understanding The Performance of DPDK as a Computer Architect

Understanding The Performance of DPDK as a Computer Architect Understanding The Performance of DPDK as a Computer Architect XIAOBAN WU *, PEILONG LI *, YAN LUO *, LIANG- MIN (LARRY) WANG +, MARC PEPIN +, AND JOHN MORGAN + * UNIVERSITY OF MASSACHUSETTS LOWELL + INTEL

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too

More information

Intel Architecture for HPC

Intel Architecture for HPC Intel Architecture for HPC Georg Zitzlsberger georg.zitzlsberger@vsb.cz 1st of March 2018 Agenda Salomon Architectures Intel R Xeon R processors v3 (Haswell) Intel R Xeon Phi TM coprocessor (KNC) Ohter

More information

Hybrid MPI - A Case Study on the Xeon Phi Platform

Hybrid MPI - A Case Study on the Xeon Phi Platform Hybrid MPI - A Case Study on the Xeon Phi Platform Udayanga Wickramasinghe Center for Research on Extreme Scale Technologies (CREST) Indiana University Greg Bronevetsky Lawrence Livermore National Laboratory

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services

Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services Max Planck Institute for Software Systems Policy-Sealed Data: A New Abstraction for Building Trusted Cloud Services 1, Rodrigo Rodrigues 2, Krishna P. Gummadi 1, Stefan Saroiu 3 MPI-SWS 1, CITI / Universidade

More information

COMMUNICATION AND I/O ARCHITECTURES FOR HIGHLY INTEGRATED MPSoC PLATFORMS OUTLINE

COMMUNICATION AND I/O ARCHITECTURES FOR HIGHLY INTEGRATED MPSoC PLATFORMS OUTLINE COMMUNICATION AND I/O ARCHITECTURES FOR HIGHLY INTEGRATED MPSoC PLATFORMS Martino Ruggiero Luca Benini University of Bologna Simone Medardoni Davide Bertozzi University of Ferrara In cooperation with STMicroelectronics

More information

Maximizing System x and ThinkServer Performance with a Balanced Memory Configuration

Maximizing System x and ThinkServer Performance with a Balanced Memory Configuration Front cover Maximizing System x and ThinkServer Performance with a Balanced Configuration Last Update: October 2017 Introduces three balanced memory guidelines for Intel Xeon s Compares the performance

More information