The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems

Size: px
Start display at page:

Download "The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems"

Transcription

1 The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems M. Katevenis, Nikolaos Chrysos, e.a. Foundation for Research & Technology - Hellas (FORTH) On Behalf of the ExaNeSt Consortium Euromicro DSD 2016, Limassol Aug. 31

2 Storage & data Germany The ExaNeSt Consortium Netherlands Italy Italy Italy UK Applications Italy UK Greece - coordinator UPV - ES Technology UK Interconnects 2

3 What ExaNeSt is about ARMv8, UNIMEM Partitioned Global Address Space (PGAS) low energy compute low overhead communication heterogeneous: FPGA accelerators working closely with ExaNoDe, EcoScale, (& EuroServer) Network: unified compute & storage, low latency Storage: distributed, in-node non-volatile memories Extreme Compute Density: totally-liquid cooling Prototype: 1K cores, 4 Tby DRAM, 40 Tby SSD, 0.5 M DSP sl s Real Applications: Scientific, Engineering, Data Analytics 3

4 The ExaNeSt Prototype ( ) Using Xilinx Zynq UltrScale+ FPGAs Four 64-bit ARM cores per FPGA Quad FPGA Daugther Boards (QFDB) Four FPGAs per QFDB 8 QFDB s per Blade System: Dozen Blades 4

5 The ExaNeSt Prototype ( ) Using Xilinx Zynq UltrScale+ FPGAs Four 64-bit ARM cores per FPGA Electronics immersed in 3M Novec liquid Quad FPGA Daugther Boards (QFDB) Four FPGAs per QFDB 8 QFDB s per Blade System: Dozen Blades Rack-level water circulation 5

6 ExaNeSt: Unimem PGAS Memory Model Enables remote loads/stores to global address space System-wide coherent memories w/o expensive hardware only one node may cache data Global Virtual Address Space Resiliency : page can move seamlessly upon node failures Difficult to maintain a global page table 6

7 ExaNeSt Unimem Implementation Enables remote loads/stores to global address space System-wide coherent memories w/o expensive hardware only one node may cache data Global Virtual Address Space Resiliency : page can move seamlessly upon node failures Difficult to maintain a global page table ExaNeSt pages stay within a coherence island (node) 7

8 ExaNest Package (Coherence Island): Xilinx Zynq Ultrascale+ Trenz Board Xilinx Zynq Ultrascale+ FPGA ExaNeSt Prototype: among the first to use 64-bit ARM FPGAs Processing 1.2 GHz : 4 Cortex A53 ARM cores 4.8 GFLOPS Plus: Real Time Processors (Cortex R5), IOMMU, Virtualized DMA Engine Progr. Logic 2.5K DSP add-mul 300 MHz 250 1K GFLOPS 8

9 ExaNeSt Node: Quad-FPGA-DaughterBoard (QFDB) 4 Ultrascale+ FPGAs all-to-all connectivity 2 x HSS (GTH) + 16 x LVDS 64 GBytes DDR Gb/s 512 GBytes SSD/NVMe 4x PCIe v2 (8 GBytes/s) 10 HSS links to remote 10 Gb/s per link 16 Gb/s best case o 120x130mm2 o Currently in layout + fabrication 9

10 ExaNeSt Blade: Packaging and Cooling Unit Initially 4 QFDBs + 2 KALEAO + 2 Thermal-only DBs for tests Later 16 QFDB-compatible slots Blade Mezzanine board QFDB Passive interconnect among local QFDBs Custom mesh-like network 32 SFP+ (cable slots) for system interconnect 500+ Gb/s per blade PCB HSS links SFP+ cables 10

11 Flexible System-Interconnect Topologies Tier 1 Blade Tier 2 System Multi-level Dragonfly QFDB blade system Small diameter High bisection Few global wires Hybrid direct + indirect networks Dragonfly + central routing boards Segregate throughput- from latency-sensitive traffic 11

12 ExaNeSt: Interconnection Network Design Goals: low latency RDMA : true zero copy flow prioritization: short (compute) vs bulky (storage) throttle congestive flows at network edges at DMA sources resiliency: error detect/correct, monitor links, multipath routing all-optical proof-of-concept switch using 2 2/4 4 building blocks 12

13 ExaNeSt Interconnect Hierarchy Hierarchy Tech Switching Tier 4 System Optical Tier 3 Tier 2 Tier 1 Tier 0 QFDB Ultrascal e+ FPGA Rack/ Cabinet Backplane Chassis Blade/ Mezzanine Node Unit Package Chip-2-Chip AXI Load/Store Weak order Optica l Optica l AXI Xbar Etherne t Etherne t Etherne t AXI Xbar APEnet APEnet APEnet APEnet Fanout T1-T2 >200 racks 5-15 chassis 6-24 Nodes/1 U 4-16 Nodes 4 FPGAs 4-6 SMP cores Bandwidth LVDS 12x (14.4Gbps) HSS 2x (32 Gbps) 40Gbps Lat low 20ns 200ns Address Scheme Custom MAC address GAS^r, MAC to GAS partition GAS^r partition Reliability X X EDC LO FA MO ACK Bit Corruption A53 Cores OK FAST OK 13

14 ExaNeSt Storage Architecture 14

15 ExaNeSt Storage Architecture QFDBs w. SSD storage Bring data closer to compute inside QFDB-level SSDs 15

16 ExaNeSt: Per-Job On-Demand SSD Caches File Payload : SSDs cache; on miss storage server 16

17 Applications, Traces Main Applications: Material science: LAMMPS Climate change: REGCM Engineering: openfoam, SailFish Astrophysics: Gadget, Pinocchio, Changa, Swift Neuroscience: DPSNN High Energy Physics: LQCD Data Analytics: MonetDB Traces generated: Scalasca profiling tool: MPI calls instrumented, several GBytes per trace, filtered down to tens of Mbytes by keeping what our network simulators will need; generally, to be made publicly available. Next Applications Porting & Tuning: currently porting selected App s to ARM, on the EuroServer Prototype 17

18 Conclusions: ExaNest TODOs Optimize, integrate & evaluate core system-level components ARM/Unimem Packaging & Cooling Interconnects Distributed NVM / Storage Fine-tuned Applications Large-scale optimized 64-bit ARM Proto, also leveraged by Other FET-HPC Projects: ExaNoDe and EcoScale ( ) 18

19 Εuropean Exascale System Ιnterconnect & Storage Interconnection Network In-node Storage Advanced Cooling Real Applications Stay

20 backup 20

21 ExaNeSt RDMA Operation Overview From user space to user space : no kernel, no copies No page pinning to avoid OS ovrhds: dest page fault Src DMA channels implement rate congestion 21

22 Potential for International Cooperations Application Programming Interfaces (API s) needed for taking advantage of new Technologies: NVM s / Distributed Storage Zero-copy, user-level communication (RDMA, mailboxes) Congestion mitigation & Resilience in Networks 22

23 Relations with cppp, SRA, other Projects cppp we feel: part of it; contributor to it; cppp is necessary for our goals SRA very useful for planing: some of our partners already contributors, more to come Relations with other FETHPC/CoE: Already within a group with ExaNoDe & EcoScale minimal collaboration axis: low-energy (ARM), UNIMEM Looking forward to widen the group on this axis Looking forward to followup projects & EsD on same axis Application CoE s are essential for HW-SW co-design Also need a CoE on HP Computing Systems Arch & SysSW 23

24 ExaNeSt RDMA Receive Context Table 24

25 ExaNeSt System Hierarchy Hierarchy Scale Performance DRAM Storage MaxPower Chiplet (DoA) Heterogeneous CPU/GPU comp unit Interposer (3D-IC) 4 x Chiplet, (DoA) Compute Node (Shared IO & Accel.) 2 Interposer plus I/O+OpenCL FPGA Package ( 16-17): Xilinx Zynq Ultrascale+ FPGA XCZU9EG CPU/GPU/ DSP Package (2020+), 2+ FPGAs on MCM New technology Compute Element (DB PCB) 2 x Node Daughter Board (New Tech) GFLOPS 8 CPUs GFLOPS 32 CPUs 1 packages 3.5 TFLOPS 64 CPUs 1 package 1 package 2 packages 4 packages 250 GFLOPS 4 ARM-53 CPUs (GPU + 2.5K DSPs) 1.5 TFLOPS 32 CPUs 7 TFLOPS 128 CPUs Up to 6x 8GB virtualized 15 W (16 GB) 64 GB virtualized 70 W 128 GB 18 GBytes DDR4 64 GBytes HMC 6 TFLOPS 128 CPUs 256 GB Host SSD GB virtualized virtualized 140 W + 20 W for I/O 20 W 50 W 256 GB 6.8 TB 320 W SSD 4TB 200 W 25

26 ExaNeSt System Hierarchy Hierarchy Scale Performance DRAM Storage MaxPower Daughter Board (New Tech) 4 packages 6 TFLOPS 128 CPUs 256 GB SSD 4 TB 200 W Mezzanine (motherboard for Elements) 4 x Element Blade (deployment unit / hot-swap) 3 x Mezzanine Blade 16 x DaughterBoards Chassis 6 x Blade+2 NetBlades 8 packages 28 TFLOPS 512 CPUs 24 packages 84 TFLOPS 1536 CPUs 64 packages 384 packages 96 TFLOPS 2K CPUs 576 TFLOPS 12.2 K CPUs 1 TB 27 TB 3 TB 81 TB 1.28 kw W Interconnect 4.2 kw W cooling 4 TB 64 TB 3.2 kw 24 TB 384 TB 25.6 kw + 5 kw cooling 26

27 ExaNeSt System Hierarchy Hierarchy Scale Performance DRAM Storage MaxPower Chassis 6 x Blade+2 NetBlades 384 packages 576 TFLOPS 12.2 K CPUs 24 TB 384 TB 25.6 kw + 5 kw cooling Rack (metal frame) 72 Blade 1728 packages 6 PFLOPS 110K CPUs 221 TB 5.8 PB 324 kw + 1 kw TOR Rack (metal frame) 12 x Chassis 4608 packages 6.9 PFLOPS 147K CPUs 288 TB 4.5 PB 367 kw Example HPC System 100 x Rack 173K packages 500 PFLOPS 11 M CPUs 22 PB 58 PB 32.5 MW ExaScale Level 167 x Rack 288K packages 1 ExaFLOPS 18.5M CPUs 37 PB 1 ExaByte 54 MW Example HPC System 100 Rack 460K packages 690 PFLOPS 14.7M CPUs 28.8 PB 450 PB 37 MW Exascale 144 x Rack 663K packages 1 ExaFLOPS 21M CPUs 41 PB 684 PB 53 MW 27

RapidIO.org Update. Mar RapidIO.org 1

RapidIO.org Update. Mar RapidIO.org 1 RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org

More information

EuroEXA Driving the technology towards exascale

EuroEXA Driving the technology towards exascale EuroEXA Driving the technology towards exascale John Goodacre Professor of Computer Architectures Advanced Processor Technologies Group University of Manchester This presentation summarises my personal

More information

RapidIO.org Update.

RapidIO.org Update. RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

Signal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech

Signal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech Signal Conversion in a Modular Open Standard Form Factor CASPER Workshop August 2017 Saeed Karamooz, VadaTech At VadaTech we are technology leaders First-to-market silicon Continuous innovation Open systems

More information

Building supercomputers from embedded technologies

Building supercomputers from embedded technologies http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

I/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more

I/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more I/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more Manolis Katevenis FORTH, Heraklion, Crete, Greece (in collab. with Univ. of Crete) http://www.ics.forth.gr/carv/

More information

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Jens Domke Research Staff at MATSUOKA Laboratory GSIC, Tokyo Institute of Technology, Japan Omni-Path User Group 2017/11/14 Denver,

More information

The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems

The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems 1 The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems M. Katevenis *, N. Chrysos *, M. Marazakis *, I. Mavroidis *, F. Chaix *, N. Kallimanis *, J. Navaridas, J. Goodacre, P.

More information

COSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team

COSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team COSMOS Architecture and Key Technologies June 1 st, 2018 COSMOS Team COSMOS: System Architecture (2) System design based on three levels of SDR radio node (S,M,L) with M,L connected via fiber to optical

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre The Evolution of the ARM Architecture Towards Big Data and the Data-Centre 8th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'13) held in conjunction with SC 13, Denver, Colorado

More information

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School

More information

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS

FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

The Many Dimensions of SDR Hardware

The Many Dimensions of SDR Hardware The Many Dimensions of SDR Hardware Plotting a Course for the Hardware Behind the Software Sept 2017 John Orlando Epiq Solutions LO RFIC Epiq Solutions in a Nutshell Schaumburg, IL EST 2009 N. Virginia

More information

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator

More information

A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan

A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource

More information

Inspur AI Computing Platform

Inspur AI Computing Platform Inspur Server Inspur AI Computing Platform 3 Server NF5280M4 (2CPU + 3 ) 4 Server NF5280M5 (2 CPU + 4 ) Node (2U 4 Only) 8 Server NF5288M5 (2 CPU + 8 ) 16 Server SR BOX (16 P40 Only) Server target market

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Octopus: A Multi-core implementation

Octopus: A Multi-core implementation Octopus: A Multi-core implementation Kalpesh Sheth HPEC 2007, MIT, Lincoln Lab Export of this products is subject to U.S. export controls. Licenses may be required. This material provides up-to-date general

More information

Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds

Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds Jeffrey Young, Sudhakar Yalamanchili School of Electrical and Computer Engineering, Georgia Institute of Technology Motivation

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions A comparative analysis with PowerEdge R510 and PERC H700 Global Solutions Engineering Dell Product

More information

Gen-Z Memory-Driven Computing

Gen-Z Memory-Driven Computing Gen-Z Memory-Driven Computing Our vision for the future of computing Patrick Demichel Distinguished Technologist Explosive growth of data More Data Need answers FAST! Value of Analyzed Data 2005 0.1ZB

More information

Atos ARM solutions for HPC

Atos ARM solutions for HPC Atos ARM solutions for HPC Eric Eppe Head of Solution Marketing & Portfolio HPC & Quantum Global Business Line Tuesday, March 7th, HPC User Forum, TERATEC Atos HPC and ARM A long time engagement 2012 2013

More information

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015 Agenda 3 rd Platform

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Enabling FPGAs in Hyperscale Data Centers

Enabling FPGAs in Hyperscale Data Centers J. Weerasinghe; IEEE CBDCom 215, Beijing; 13 th August 215 Enabling s in Hyperscale Data Centers J. Weerasinghe 1, F. Abel 1, C. Hagleitner 1, A. Herkersdorf 2 1 IBM Research Zurich Laboratory 2 Technical

More information

Tightly Coupled Accelerators Architecture

Tightly Coupled Accelerators Architecture Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

The Road from Peta to ExaFlop

The Road from Peta to ExaFlop The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Safe Harbor Statement The following is intended to outline our general product direction.

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware

More information

Proposers Day Workshop

Proposers Day Workshop Proposers Day Workshop Monday, January 23, 2017 @srcjump, #JUMPpdw Intelligent Memory and Storage Vertical Research Center Sean Eilert Fellow Micron Technology High Level Overview Conventional Bottlenecks

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

Paving the Road to Exascale

Paving the Road to Exascale Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015

More information

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE www.iceotope.com DATA SHEET INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE BLADE SERVER TM PLATFORM 80% Our liquid cooling platform is proven to reduce cooling energy consumption by

More information

CPU Agnostic Motherboard design with RapidIO Interconnect in Data Center

CPU Agnostic Motherboard design with RapidIO Interconnect in Data Center Agnostic Motherboard design with RapidIO Interconnect in Data Center Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council 2013 RapidIO Trade Association Agenda

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

RDMA in Embedded Fabrics

RDMA in Embedded Fabrics RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded

More information

New! New! New! New! New!

New! New! New! New! New! New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to

More information

Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation

Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation Dr. Li Li Distinguished Engineer June 28, 2016 Outline Evolution of Internet The Promise of Internet

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center

SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent

More information

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research THE PATH TO EXASCALE COMPUTING Bill Dally Chief Scientist and Senior Vice President of Research The Goal: Sustained ExaFLOPs on problems of interest 2 Exascale Challenges Energy efficiency Programmability

More information

Barcelona Supercomputing Center

Barcelona Supercomputing Center www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:

More information

User Training Cray XC40 IITM, Pune

User Training Cray XC40 IITM, Pune User Training Cray XC40 IITM, Pune Sudhakar Yerneni, Raviteja K, Nachiket Manapragada, etc. 1 Cray XC40 Architecture & Packaging 3 Cray XC Series Building Blocks XC40 System Compute Blade 4 Compute Nodes

More information

High performance Computing and O&G Challenges

High performance Computing and O&G Challenges High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

Elaborazione dati real-time su architetture embedded many-core e FPGA

Elaborazione dati real-time su architetture embedded many-core e FPGA Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information

The Future of GPU Computing

The Future of GPU Computing The Future of GPU Computing Bill Dally Chief Scientist & Sr. VP of Research, NVIDIA Bell Professor of Engineering, Stanford University November 18, 2009 The Future of Computing Bill Dally Chief Scientist

More information

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics Zvonimir Z. Bandic, Sr. Director, Next Generation Platform Technologies Western Digital Corporation

More information

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric

More information

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.

Hardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT

More information

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org

More information

Strategies for Deploying RFSoC Technology for SIGINT, DRFM and Radar Applications. Rodger Hosking Pentek, Inc. WInnForum Webinar November 8, 2018

Strategies for Deploying RFSoC Technology for SIGINT, DRFM and Radar Applications. Rodger Hosking Pentek, Inc. WInnForum Webinar November 8, 2018 Strategies for Deploying RFSoC Technology for SIGINT, DRFM and Radar Applications Rodger Hosking Pentek, Inc. WInnForum Webinar November 8, 2018 1 Topics Xilinx RFSoC Overview Impact of Latency on Applications

More information

CCIX: a new coherent multichip interconnect for accelerated use cases

CCIX: a new coherent multichip interconnect for accelerated use cases : a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Design of Scalable Network Considering Diameter and Cable Delay

Design of Scalable Network Considering Diameter and Cable Delay Tohoku Design of Scalable etwork Considering Diameter and Cable Delay Kentaro Sano Tohoku University, JAPA Agenda Introduction Assumption Preliminary evaluation & candidate networks Cable length and delay

More information

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013 SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and

More information

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1 1 DDN DDN Updates DataDirect Neworks Japan, Inc Nobu Hashizume DDN Storage 2018 DDN Storage 1 2 DDN A Broad Range of Technologies to Best Address Your Needs Your Use Cases Research Big Data Enterprise

More information

Strategies for Deploying Xilinx s Zynq UltraScale+ RFSoC

Strategies for Deploying Xilinx s Zynq UltraScale+ RFSoC Strategies for Deploying Xilinx s Zynq UltraScale+ RFSoC by Robert Sgandurra Director, Product Management On February 21 st, 2017, Xilinx announced the introduction of a new technology called RFSoC with

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Density Optimized System Enabling Next-Gen Performance

Density Optimized System Enabling Next-Gen Performance Product brief High Performance Computing (HPC) and Hyper-Converged Infrastructure (HCI) Intel Server Board S2600BP Product Family Featuring the Intel Xeon Processor Scalable Family Density Optimized System

More information

Interconnection Network for Tightly Coupled Accelerators Architecture

Interconnection Network for Tightly Coupled Accelerators Architecture Interconnection Network for Tightly Coupled Accelerators Architecture Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato Center for Computational Sciences University of Tsukuba, Japan 1 What

More information

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing

More information

N V M e o v e r F a b r i c s -

N V M e o v e r F a b r i c s - N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server

More information

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org

More information

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems

Verification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems Verification Futures 2016 Nick Heaton, Distinguished Engineer, Cadence Systems Agenda Update on Challenges presented in 2015, namely Scalability of the verification engines The rise of Use-Case Driven

More information

New! New! New! New! New!

New! New! New! New! New! New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to

More information

Godson Processor and its Application in High Performance Computers

Godson Processor and its Application in High Performance Computers Godson Processor and its Application in High Performance Computers Weiwu Hu Institute of Computing Technology, Chinese Academy of Sciences Loongson Technologies Corporation Limited hww@ict.ac.cn 1 Contents

More information

Stacked Silicon Interconnect Technology (SSIT)

Stacked Silicon Interconnect Technology (SSIT) Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation

More information

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

Panel Discussion: The Future of I/O From a CPU Architecture Perspective Panel Discussion: The Future of I/O From a CPU Architecture Perspective Brad Benton AMD, Inc. #OFADevWorkshop Issues Move to Exascale involves more parallel processing across more processing elements GPUs,

More information

Farewell to Servers: Resource Disaggregation

Farewell to Servers: Resource Disaggregation Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Can monolithic Application Hardware servers

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance

More information

Enabling Technology for the Cloud and AI One Size Fits All?

Enabling Technology for the Cloud and AI One Size Fits All? Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information