The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems
|
|
- Leonard Richard
- 5 years ago
- Views:
Transcription
1 The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems M. Katevenis, Nikolaos Chrysos, e.a. Foundation for Research & Technology - Hellas (FORTH) On Behalf of the ExaNeSt Consortium Euromicro DSD 2016, Limassol Aug. 31
2 Storage & data Germany The ExaNeSt Consortium Netherlands Italy Italy Italy UK Applications Italy UK Greece - coordinator UPV - ES Technology UK Interconnects 2
3 What ExaNeSt is about ARMv8, UNIMEM Partitioned Global Address Space (PGAS) low energy compute low overhead communication heterogeneous: FPGA accelerators working closely with ExaNoDe, EcoScale, (& EuroServer) Network: unified compute & storage, low latency Storage: distributed, in-node non-volatile memories Extreme Compute Density: totally-liquid cooling Prototype: 1K cores, 4 Tby DRAM, 40 Tby SSD, 0.5 M DSP sl s Real Applications: Scientific, Engineering, Data Analytics 3
4 The ExaNeSt Prototype ( ) Using Xilinx Zynq UltrScale+ FPGAs Four 64-bit ARM cores per FPGA Quad FPGA Daugther Boards (QFDB) Four FPGAs per QFDB 8 QFDB s per Blade System: Dozen Blades 4
5 The ExaNeSt Prototype ( ) Using Xilinx Zynq UltrScale+ FPGAs Four 64-bit ARM cores per FPGA Electronics immersed in 3M Novec liquid Quad FPGA Daugther Boards (QFDB) Four FPGAs per QFDB 8 QFDB s per Blade System: Dozen Blades Rack-level water circulation 5
6 ExaNeSt: Unimem PGAS Memory Model Enables remote loads/stores to global address space System-wide coherent memories w/o expensive hardware only one node may cache data Global Virtual Address Space Resiliency : page can move seamlessly upon node failures Difficult to maintain a global page table 6
7 ExaNeSt Unimem Implementation Enables remote loads/stores to global address space System-wide coherent memories w/o expensive hardware only one node may cache data Global Virtual Address Space Resiliency : page can move seamlessly upon node failures Difficult to maintain a global page table ExaNeSt pages stay within a coherence island (node) 7
8 ExaNest Package (Coherence Island): Xilinx Zynq Ultrascale+ Trenz Board Xilinx Zynq Ultrascale+ FPGA ExaNeSt Prototype: among the first to use 64-bit ARM FPGAs Processing 1.2 GHz : 4 Cortex A53 ARM cores 4.8 GFLOPS Plus: Real Time Processors (Cortex R5), IOMMU, Virtualized DMA Engine Progr. Logic 2.5K DSP add-mul 300 MHz 250 1K GFLOPS 8
9 ExaNeSt Node: Quad-FPGA-DaughterBoard (QFDB) 4 Ultrascale+ FPGAs all-to-all connectivity 2 x HSS (GTH) + 16 x LVDS 64 GBytes DDR Gb/s 512 GBytes SSD/NVMe 4x PCIe v2 (8 GBytes/s) 10 HSS links to remote 10 Gb/s per link 16 Gb/s best case o 120x130mm2 o Currently in layout + fabrication 9
10 ExaNeSt Blade: Packaging and Cooling Unit Initially 4 QFDBs + 2 KALEAO + 2 Thermal-only DBs for tests Later 16 QFDB-compatible slots Blade Mezzanine board QFDB Passive interconnect among local QFDBs Custom mesh-like network 32 SFP+ (cable slots) for system interconnect 500+ Gb/s per blade PCB HSS links SFP+ cables 10
11 Flexible System-Interconnect Topologies Tier 1 Blade Tier 2 System Multi-level Dragonfly QFDB blade system Small diameter High bisection Few global wires Hybrid direct + indirect networks Dragonfly + central routing boards Segregate throughput- from latency-sensitive traffic 11
12 ExaNeSt: Interconnection Network Design Goals: low latency RDMA : true zero copy flow prioritization: short (compute) vs bulky (storage) throttle congestive flows at network edges at DMA sources resiliency: error detect/correct, monitor links, multipath routing all-optical proof-of-concept switch using 2 2/4 4 building blocks 12
13 ExaNeSt Interconnect Hierarchy Hierarchy Tech Switching Tier 4 System Optical Tier 3 Tier 2 Tier 1 Tier 0 QFDB Ultrascal e+ FPGA Rack/ Cabinet Backplane Chassis Blade/ Mezzanine Node Unit Package Chip-2-Chip AXI Load/Store Weak order Optica l Optica l AXI Xbar Etherne t Etherne t Etherne t AXI Xbar APEnet APEnet APEnet APEnet Fanout T1-T2 >200 racks 5-15 chassis 6-24 Nodes/1 U 4-16 Nodes 4 FPGAs 4-6 SMP cores Bandwidth LVDS 12x (14.4Gbps) HSS 2x (32 Gbps) 40Gbps Lat low 20ns 200ns Address Scheme Custom MAC address GAS^r, MAC to GAS partition GAS^r partition Reliability X X EDC LO FA MO ACK Bit Corruption A53 Cores OK FAST OK 13
14 ExaNeSt Storage Architecture 14
15 ExaNeSt Storage Architecture QFDBs w. SSD storage Bring data closer to compute inside QFDB-level SSDs 15
16 ExaNeSt: Per-Job On-Demand SSD Caches File Payload : SSDs cache; on miss storage server 16
17 Applications, Traces Main Applications: Material science: LAMMPS Climate change: REGCM Engineering: openfoam, SailFish Astrophysics: Gadget, Pinocchio, Changa, Swift Neuroscience: DPSNN High Energy Physics: LQCD Data Analytics: MonetDB Traces generated: Scalasca profiling tool: MPI calls instrumented, several GBytes per trace, filtered down to tens of Mbytes by keeping what our network simulators will need; generally, to be made publicly available. Next Applications Porting & Tuning: currently porting selected App s to ARM, on the EuroServer Prototype 17
18 Conclusions: ExaNest TODOs Optimize, integrate & evaluate core system-level components ARM/Unimem Packaging & Cooling Interconnects Distributed NVM / Storage Fine-tuned Applications Large-scale optimized 64-bit ARM Proto, also leveraged by Other FET-HPC Projects: ExaNoDe and EcoScale ( ) 18
19 Εuropean Exascale System Ιnterconnect & Storage Interconnection Network In-node Storage Advanced Cooling Real Applications Stay
20 backup 20
21 ExaNeSt RDMA Operation Overview From user space to user space : no kernel, no copies No page pinning to avoid OS ovrhds: dest page fault Src DMA channels implement rate congestion 21
22 Potential for International Cooperations Application Programming Interfaces (API s) needed for taking advantage of new Technologies: NVM s / Distributed Storage Zero-copy, user-level communication (RDMA, mailboxes) Congestion mitigation & Resilience in Networks 22
23 Relations with cppp, SRA, other Projects cppp we feel: part of it; contributor to it; cppp is necessary for our goals SRA very useful for planing: some of our partners already contributors, more to come Relations with other FETHPC/CoE: Already within a group with ExaNoDe & EcoScale minimal collaboration axis: low-energy (ARM), UNIMEM Looking forward to widen the group on this axis Looking forward to followup projects & EsD on same axis Application CoE s are essential for HW-SW co-design Also need a CoE on HP Computing Systems Arch & SysSW 23
24 ExaNeSt RDMA Receive Context Table 24
25 ExaNeSt System Hierarchy Hierarchy Scale Performance DRAM Storage MaxPower Chiplet (DoA) Heterogeneous CPU/GPU comp unit Interposer (3D-IC) 4 x Chiplet, (DoA) Compute Node (Shared IO & Accel.) 2 Interposer plus I/O+OpenCL FPGA Package ( 16-17): Xilinx Zynq Ultrascale+ FPGA XCZU9EG CPU/GPU/ DSP Package (2020+), 2+ FPGAs on MCM New technology Compute Element (DB PCB) 2 x Node Daughter Board (New Tech) GFLOPS 8 CPUs GFLOPS 32 CPUs 1 packages 3.5 TFLOPS 64 CPUs 1 package 1 package 2 packages 4 packages 250 GFLOPS 4 ARM-53 CPUs (GPU + 2.5K DSPs) 1.5 TFLOPS 32 CPUs 7 TFLOPS 128 CPUs Up to 6x 8GB virtualized 15 W (16 GB) 64 GB virtualized 70 W 128 GB 18 GBytes DDR4 64 GBytes HMC 6 TFLOPS 128 CPUs 256 GB Host SSD GB virtualized virtualized 140 W + 20 W for I/O 20 W 50 W 256 GB 6.8 TB 320 W SSD 4TB 200 W 25
26 ExaNeSt System Hierarchy Hierarchy Scale Performance DRAM Storage MaxPower Daughter Board (New Tech) 4 packages 6 TFLOPS 128 CPUs 256 GB SSD 4 TB 200 W Mezzanine (motherboard for Elements) 4 x Element Blade (deployment unit / hot-swap) 3 x Mezzanine Blade 16 x DaughterBoards Chassis 6 x Blade+2 NetBlades 8 packages 28 TFLOPS 512 CPUs 24 packages 84 TFLOPS 1536 CPUs 64 packages 384 packages 96 TFLOPS 2K CPUs 576 TFLOPS 12.2 K CPUs 1 TB 27 TB 3 TB 81 TB 1.28 kw W Interconnect 4.2 kw W cooling 4 TB 64 TB 3.2 kw 24 TB 384 TB 25.6 kw + 5 kw cooling 26
27 ExaNeSt System Hierarchy Hierarchy Scale Performance DRAM Storage MaxPower Chassis 6 x Blade+2 NetBlades 384 packages 576 TFLOPS 12.2 K CPUs 24 TB 384 TB 25.6 kw + 5 kw cooling Rack (metal frame) 72 Blade 1728 packages 6 PFLOPS 110K CPUs 221 TB 5.8 PB 324 kw + 1 kw TOR Rack (metal frame) 12 x Chassis 4608 packages 6.9 PFLOPS 147K CPUs 288 TB 4.5 PB 367 kw Example HPC System 100 x Rack 173K packages 500 PFLOPS 11 M CPUs 22 PB 58 PB 32.5 MW ExaScale Level 167 x Rack 288K packages 1 ExaFLOPS 18.5M CPUs 37 PB 1 ExaByte 54 MW Example HPC System 100 Rack 460K packages 690 PFLOPS 14.7M CPUs 28.8 PB 450 PB 37 MW Exascale 144 x Rack 663K packages 1 ExaFLOPS 21M CPUs 41 PB 684 PB 53 MW 27
RapidIO.org Update. Mar RapidIO.org 1
RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org
More informationEuroEXA Driving the technology towards exascale
EuroEXA Driving the technology towards exascale John Goodacre Professor of Computer Architectures Advanced Processor Technologies Group University of Manchester This presentation summarises my personal
More informationRapidIO.org Update.
RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationSignal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech
Signal Conversion in a Modular Open Standard Form Factor CASPER Workshop August 2017 Saeed Karamooz, VadaTech At VadaTech we are technology leaders First-to-market silicon Continuous innovation Open systems
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationI/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more
I/O, today, is Remote (block) Load/Store, and must not be slower than Compute, any more Manolis Katevenis FORTH, Heraklion, Crete, Greece (in collab. with Univ. of Crete) http://www.ics.forth.gr/carv/
More informationResults from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence
Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Jens Domke Research Staff at MATSUOKA Laboratory GSIC, Tokyo Institute of Technology, Japan Omni-Path User Group 2017/11/14 Denver,
More informationThe ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems
1 The ExaNeSt Project: Interconnects, Storage, and Packaging for Exascale Systems M. Katevenis *, N. Chrysos *, M. Marazakis *, I. Mavroidis *, F. Chaix *, N. Kallimanis *, J. Navaridas, J. Goodacre, P.
More informationCOSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team
COSMOS Architecture and Key Technologies June 1 st, 2018 COSMOS Team COSMOS: System Architecture (2) System design based on three levels of SDR radio node (S,M,L) with M,L connected via fiber to optical
More informationWhite paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation
White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview
More informationThe Evolution of the ARM Architecture Towards Big Data and the Data-Centre
The Evolution of the ARM Architecture Towards Big Data and the Data-Centre 8th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'13) held in conjunction with SC 13, Denver, Colorado
More informationFPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS
FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School
More informationFPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS
FPGA Acceleration of the LFRic Weather and Climate Model in the EuroExa Project Using Vivado HLS Mike Ashworth, Graham Riley, Andrew Attwood and John Mawer Advanced Processor Technologies Group School
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationThe Many Dimensions of SDR Hardware
The Many Dimensions of SDR Hardware Plotting a Course for the Hardware Behind the Software Sept 2017 John Orlando Epiq Solutions LO RFIC Epiq Solutions in a Nutshell Schaumburg, IL EST 2009 N. Virginia
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationA Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan
LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource
More informationInspur AI Computing Platform
Inspur Server Inspur AI Computing Platform 3 Server NF5280M4 (2CPU + 3 ) 4 Server NF5280M5 (2 CPU + 4 ) Node (2U 4 Only) 8 Server NF5288M5 (2 CPU + 8 ) 16 Server SR BOX (16 P40 Only) Server target market
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationOctopus: A Multi-core implementation
Octopus: A Multi-core implementation Kalpesh Sheth HPEC 2007, MIT, Lincoln Lab Export of this products is subject to U.S. export controls. Licenses may be required. This material provides up-to-date general
More informationCommodity Converged Fabrics for Global Address Spaces in Accelerator Clouds
Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds Jeffrey Young, Sudhakar Yalamanchili School of Electrical and Computer Engineering, Georgia Institute of Technology Motivation
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationDell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions
Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions A comparative analysis with PowerEdge R510 and PERC H700 Global Solutions Engineering Dell Product
More informationGen-Z Memory-Driven Computing
Gen-Z Memory-Driven Computing Our vision for the future of computing Patrick Demichel Distinguished Technologist Explosive growth of data More Data Need answers FAST! Value of Analyzed Data 2005 0.1ZB
More informationAtos ARM solutions for HPC
Atos ARM solutions for HPC Eric Eppe Head of Solution Marketing & Portfolio HPC & Quantum Global Business Line Tuesday, March 7th, HPC User Forum, TERATEC Atos HPC and ARM A long time engagement 2012 2013
More informationHighly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture
A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015 Agenda 3 rd Platform
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationEnabling FPGAs in Hyperscale Data Centers
J. Weerasinghe; IEEE CBDCom 215, Beijing; 13 th August 215 Enabling s in Hyperscale Data Centers J. Weerasinghe 1, F. Abel 1, C. Hagleitner 1, A. Herkersdorf 2 1 IBM Research Zurich Laboratory 2 Technical
More informationTightly Coupled Accelerators Architecture
Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationThe Road from Peta to ExaFlop
The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationA Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED
A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationSwitchX Virtual Protocol Interconnect (VPI) Switch Architecture
SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationM7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle
M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Safe Harbor Statement The following is intended to outline our general product direction.
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationFarewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware
More informationProposers Day Workshop
Proposers Day Workshop Monday, January 23, 2017 @srcjump, #JUMPpdw Intelligent Memory and Storage Vertical Research Center Sean Eilert Fellow Micron Technology High Level Overview Conventional Bottlenecks
More information1. NoCs: What s the point?
1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationINCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE
www.iceotope.com DATA SHEET INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE BLADE SERVER TM PLATFORM 80% Our liquid cooling platform is proven to reduce cooling energy consumption by
More informationCPU Agnostic Motherboard design with RapidIO Interconnect in Data Center
Agnostic Motherboard design with RapidIO Interconnect in Data Center Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council 2013 RapidIO Trade Association Agenda
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationRDMA in Embedded Fabrics
RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded
More informationNew! New! New! New! New!
New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to
More informationEmerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation
Emerging IC Packaging Platforms for ICT Systems - MEPTEC, IMAPS and SEMI Bay Area Luncheon Presentation Dr. Li Li Distinguished Engineer June 28, 2016 Outline Evolution of Internet The Promise of Internet
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationSmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center
SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent
More informationTHE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research
THE PATH TO EXASCALE COMPUTING Bill Dally Chief Scientist and Senior Vice President of Research The Goal: Sustained ExaFLOPs on problems of interest 2 Exascale Challenges Energy efficiency Programmability
More informationBarcelona Supercomputing Center
www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:
More informationUser Training Cray XC40 IITM, Pune
User Training Cray XC40 IITM, Pune Sudhakar Yerneni, Raviteja K, Nachiket Manapragada, etc. 1 Cray XC40 Architecture & Packaging 3 Cray XC Series Building Blocks XC40 System Compute Blade 4 Compute Nodes
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationTile Processor (TILEPro64)
Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth
More informationElaborazione dati real-time su architetture embedded many-core e FPGA
Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationAdaptable Intelligence The Next Computing Era
Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion
More informationThe Future of GPU Computing
The Future of GPU Computing Bill Dally Chief Scientist & Sr. VP of Research, NVIDIA Bell Professor of Engineering, Stanford University November 18, 2009 The Future of Computing Bill Dally Chief Scientist
More informationRealizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics
Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics Zvonimir Z. Bandic, Sr. Director, Next Generation Platform Technologies Western Digital Corporation
More informationIn-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017
In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric
More informationHardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.
Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationStrategies for Deploying RFSoC Technology for SIGINT, DRFM and Radar Applications. Rodger Hosking Pentek, Inc. WInnForum Webinar November 8, 2018
Strategies for Deploying RFSoC Technology for SIGINT, DRFM and Radar Applications Rodger Hosking Pentek, Inc. WInnForum Webinar November 8, 2018 1 Topics Xilinx RFSoC Overview Impact of Latency on Applications
More informationCCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationDesign of Scalable Network Considering Diameter and Cable Delay
Tohoku Design of Scalable etwork Considering Diameter and Cable Delay Kentaro Sano Tohoku University, JAPA Agenda Introduction Assumption Preliminary evaluation & candidate networks Cable length and delay
More informationSAP HANA. Jake Klein/ SVP SAP HANA June, 2013
SAP HANA Jake Klein/ SVP SAP HANA June, 2013 SAP 3 YEARS AGO Middleware BI / Analytics Core ERP + Suite 2013 WHERE ARE WE NOW? Cloud Mobile Applications SAP HANA Analytics D&T Changed Reality Disruptive
More informationNext Generation Enterprise Solutions from ARM
Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the
More informationFuture of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1
Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and
More informationDDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1
1 DDN DDN Updates DataDirect Neworks Japan, Inc Nobu Hashizume DDN Storage 2018 DDN Storage 1 2 DDN A Broad Range of Technologies to Best Address Your Needs Your Use Cases Research Big Data Enterprise
More informationStrategies for Deploying Xilinx s Zynq UltraScale+ RFSoC
Strategies for Deploying Xilinx s Zynq UltraScale+ RFSoC by Robert Sgandurra Director, Product Management On February 21 st, 2017, Xilinx announced the introduction of a new technology called RFSoC with
More informationThe Mont-Blanc Project
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationDensity Optimized System Enabling Next-Gen Performance
Product brief High Performance Computing (HPC) and Hyper-Converged Infrastructure (HCI) Intel Server Board S2600BP Product Family Featuring the Intel Xeon Processor Scalable Family Density Optimized System
More informationInterconnection Network for Tightly Coupled Accelerators Architecture
Interconnection Network for Tightly Coupled Accelerators Architecture Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato Center for Computational Sciences University of Tsukuba, Japan 1 What
More informationOvercoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics
Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing
More informationN V M e o v e r F a b r i c s -
N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationVerification Futures Nick Heaton, Distinguished Engineer, Cadence Design Systems
Verification Futures 2016 Nick Heaton, Distinguished Engineer, Cadence Systems Agenda Update on Challenges presented in 2015, namely Scalability of the verification engines The rise of Use-Case Driven
More informationNew! New! New! New! New!
New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to
More informationGodson Processor and its Application in High Performance Computers
Godson Processor and its Application in High Performance Computers Weiwu Hu Institute of Computing Technology, Chinese Academy of Sciences Loongson Technologies Corporation Limited hww@ict.ac.cn 1 Contents
More informationStacked Silicon Interconnect Technology (SSIT)
Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation
More informationPanel Discussion: The Future of I/O From a CPU Architecture Perspective
Panel Discussion: The Future of I/O From a CPU Architecture Perspective Brad Benton AMD, Inc. #OFADevWorkshop Issues Move to Exascale involves more parallel processing across more processing elements GPUs,
More informationFarewell to Servers: Resource Disaggregation
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Can monolithic Application Hardware servers
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationJohn Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems
TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance
More informationEnabling Technology for the Cloud and AI One Size Fits All?
Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence
More informationPacket Switch Architecture
Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.
More information