Trends in HPC Architectures

Size: px
Start display at page:

Download "Trends in HPC Architectures"

Transcription

1 Mitglied der Helmholtz-Gemeinschaft Trends in HPC Architectures Norbert Eicker Institute for Advanced Simulation Jülich Supercomputing Centre PRACE/LinkSCEEM-2 CyI 2011 Winter School Nikosia, Cyprus

2 Forschungszentrum Jülich (FZJ) Slide 2

3 GCS: Gauss Centre for Supercomputing Germany s Tier-0/1 Supercomputing Complex Association with Garching and Stuttgart A single joint scientific governance Germany s representative in PRACE More information: Slide 3

4 Outline Today's common architectures Clusters MPP Accelerators Programming, Communication Exascale Challenges Energy Resiliency Applications Ideas on the way to Exascale BG/Q, QPACE, DEEP Conclusions Slide 4

5 Current HPC systems Today Supercomputers are (massively) parallel TOP500 list gives a interesting (historical) overview Updated twice a year Based on Linpack benchmark Solve a dense system of linear equations Ranks the 500 most powerful systems Cluster Computers dominate for several years Constellation are kind of Clusters, too Hard to distinguish MPPs from Clusters Less dominant in TOP50 Slide 5

6 TOP 500 Architectures by Systems Slide 6

7 TOP 500 Architectures by Performance Slide 7

8 Cluster ingredients Processor is the heart Provides compute power Memory / Storage is the brain Nowadays many hierarchies (caches, DRAM, SSD, HD, tape) Networks are the nerves Link the nodes to each other Most often more than one (MPI, Administration, I/O) Software is the soul No Cluster-awareness without middleware Felt to be part of MPI, but more (process management, etc.) OpenSource was important prerequisite Balance is more important than single components Slide 8

9 Cluster Computers TOP500 systems: At least 1000 cores Mostly standard processors 78.4% Intel EM64T 11.4 % AMD 8.0 % Power Typically more than 1 OS image Additional Software required: MPI, Middleware Powerful interconnect Basis for scalability Cluster-Computers use COTS Slide 9

10 Cluster Interconnects Main differentiator against MPP Non-proprietary Two classes of Cluster-Systems Capability Clusters Huge Bandwidth, small Latency Capacity Clusters Less powerful interconnect Embarrassing parallel applications Gigabit Ethernet dominates lower half of TOP500 Mainly no real HPC applications Widely used in department HPC Significantly lower Linpack efficiency Slide 10

11 TOP500 Networks Slide 11

12 JuRoPA 2208 compute nodes 2 Intel Nehalem-EP quad-core processors 2.93 GHz SMT (Simultaneous Multithreading) 24 GB memory (DDR3, 1066 MHz) IB ConnectX QDR HCA (MT26428) (QNEM) cores, 207 TF peak Sun Microsystems Blade SB6048 Infiniband QDR with non-blocking Fat Tree topology ParaStation Cluster-OS Slide 12

13 JuRoPA Slide 13

14 HPC-FF 1080 compute nodes 2 Intel Nehalem-EP quad-core processors 2.93 GHz SMT (Simultaneous Multithreading) 24 GB memory (DDR3, 1066 MHz) IB ConnectX QDR HCA (MT26428) 8640 cores, 101 TF peak Bull NovaScale R422-E2 Infiniband QDR with non-blocking Fat Tree topology ParaStation Cluster-OS Slide 14

15 HPC-FF Slide 15

16 Overall design schematic view The Future of Cluster-Computing Slide 16

17 Massively Parallel Processing MPP Main differentiators Interconnect Proprietary on MPP Integration Proprietary cooling, software, etc. Mainly two lines Cray XT-series AMD Opteron processors, Cray Seastar 3-D torus fabric Hard to distinguish from a Cluster from HW point of view Similar performance / scalability characteristics IBM BlueGene family (BG/L, BG/P) PowerPC 4x0-series processors, 3-D torus fabric + more Trade node-performance for energy-efficiency & balance Very scalable codes required More in the top 10% of TOP500 Slide 17

18 JUGENE: Jülich s Scalable Petaflop System IBM Blue Gene/P JUGENE 32-bit PowerPC 450 core 850 MHz, 4-way SMP 72 racks, 294,912 procs 1 Petaflop/s peak 144 TByte main memory connected to a Global Parallel File System (GPFS) with 5 PByte online disk capacity and up to 25 PByte offline tape capacity Torus network First Petaflop system in Europe Slide 18

19 The Jülich Dualistic Concept 2004: Constellation systems found unable to scale Portfolio of applications can be (very roughly) divided in two to three parts: Highly scalable codes, sparse-matrix vector like or dominated Highly complex codes, adaptive grids or coordinate based, allto-all or more intricate communication patterns, large memory, less scalable Embarrassing parallel codes, parameter studies Not our main focus: Farming, Grids, Clouds At that time JSC was unable to serve highly scalable codes JSC decided to adapt hardware roadmap to this situation Slide 19

20 Jülich Dual Concept Hardware IBM Power 4+ JUMP, 9 TFlop/s 2004 IBM Blue Gene/L JUBL, 45 TFlop/s 2005/6 IBM Blue Gene/P JUGENE, 223 TFlop/s 2007/8 File Server GPFS 2009 File Server GPFS, Lustre General-Purpose Highly-Scalable Slide 20

21 Use by Science Field JUROPA ~ 200 Projects JUGENE ~40 Projects Slide 21

22 Balance Compute-power vs. Bandwidths Measure Bandwidth in Bytes / Flop Memory aims for 1 Byte / Flop Not reached for most machines today (JuRoPA ~0.5 B/Flop) BlueGene trades Compute-power for Balance Only 850 MHz clock Memory wall Bandwidth not expected to grow with compute power Limited by # of connectors (optical links might help) Network aims for 10% of memory-bandwidth System bus (PCIe) shares same pins on package Algorithms surface/volume ratio determines required bandwidth Slide 22

23 Balance Compute-power vs. Latencies Measure Latencies not in absolute times, but in operations Memory latencies tried to hide by Caches Today complex hierarchies (e.g. Nehalem L1/L2/L3) Algorithm has to exploit via memory locality First level of parallelism Network latencies O(1) µsec Several thousands FP operations Hide by algorithm (asynchronous comm.) Latency Wall No significant progress expected for interconnects Algorithm might have to be adapted / changed Slide 23

24 Rationale Can the next generation cluster computers compete with proprietary solutions like Blue Gene or Cray? Blue Gene /P /Q gives factor 20 in compute speed at the same energy envelope and costs in 4 years Cray is more dependent on processor development Standard processor speed will increase by about a factor of 4 to at most 8 in 4 years Clusters need to utilize accelerators Current accelerators are tightly coupled to interconnect Integrated processors expected not before 2015 Slide 24

25 Accelerators FPGA Field Programmable Gate Array Programmable Hardware Algorithm transformed into logical circuits Significant effort to program VHSIC (Very High Speed Integrated Circuit) Hardware Description Language significantly different from C or Fortran Only promising for selected applications FPUs are very good at Multiply/Add Promising for non-fp applications (Genome) or non-pipelined FP-operations (Astrophysics) We had a Cray XD1 equipped with FPGA GRAPE: first sustained PFlop/s system ever (not in TOP500) Slide 25

26 Accelerators ClearSpeed Put as many FPUs on a Chip as possible Accompany them with fast memory Programming with standard C Pitfalls: Manually split programs to host & accelerator Manual data-transfer between host & accelerator Not commodity Commodity kills the Performance-Star At some point of time GPUs became more powerful Slide 26

27 Accelerators Cell First heterogeneous multi-core 50 to 80 W at 3.2 GHz 1 PowerPC CPU (PPE) w/ 32 kb L1 caches (D/I) 8 SPEs w/ 256 kb private memory (Local Store) each SPE can do 4 FMAs per cycle 204.8/104.2 GFlop/s at 3.2 GHz 512 kb on-chip shared L2 cache GB/s EIB bandwidth 25.6 GB/s memory bandwidth Unfortunately killed by STI IBM claims to present features in future Power-designs Slide 27

28 Accelerators GPGPU Modern GPUs are basically powerful FPUs Excellent price-performance ratio Surfing the wave of gaming Still missing some features Double Precision IEEE rounding ECC memory PCIe host capabilities nvidia, AMD (ATI), Intel announced MIC Slide 28

29 GPU-Accelerated Cluster GPU CN GPU CN GPU CN InfiniBand Flat topology Simple management of resources CN GPU CN GPU CN GPU Static assignment of accelerators to CPUs Accelerators cannot act autonomously Slide 29

30 GPU-Accelerated Cluster Explicit programming of GPUs required Applications have to be adapted CUDA (nvidia), OpenCL (AMD), TBB (Intel) Unclear which paradigm will survive Might be hidden in the future (Compiler, global paradigm) PGI claims to support CUDA with their compilers Severe interference with considerations on balance Increase node-performance by 10 Memory-bandwidth limited by PCIe Competing use of PCIe bus No direct communication from GPU GPU-mem CPU-mem IB CPU-mem GPU-mem Latency penalties Slide 30

31 Accelerated Cluster-node internal structure DDR3 CPU QPI/HT MEM MEM DDR3 QPI/HT SB PCIe HCA PCIe GPU GDDR5 MEM PCIe GPU GDDR5 MEM CPU No direct communication from GPU to HCA Data passed via CPU's memory GPUs and HCA compete for scarce PCIe resources Hard to find kernels to off-load ~ complex operations, ~ communication, limited bandwidth,... Slide 31

32 ExaScale Systems PetaFlop (1015) systems are up an running Sustained PetaFlop for broader range of applications coming soon (BlueWaters, etc.) History shows: each scale (factor 1000) takes ~10 years Look at problems to expect for next step: ExaFlop (1018) Power consumption (are ~100 MW acceptable?) Resiliency What about I/O How to program such beast Programming models Do current algorithms still work out? Slide 32

33 ExaScale Challenges Energy Power consumption will increase in the future What is the critical limit? JSC has 5 MW, potential of 10 MW 1 MW is 1 M / year 20 MW expected to be the critical limit Are ExaScale systems a Large Scale Facility? LHC uses 100 MW Energy efficiency Cooling uses significant fraction (PUE > 1.2 today 1.0) Hot cooling water (40 C and more) might help Free cooling: use free air to cool water Heat recycling: use waste heat for heating, cooling, etc. Slide 33

34 ExaScale Challenges Resiliency Ever increasing number of components O(10000) nodes O(100000) DIMMs of RAM Each component's MTBF will not increase Optimistic: Remains constant Realistic: Smaller structures, lower voltages decrease Global MTBF will decrease Critical limit? 1 day? 1 hour? Time to write checkpoint! How to handle failures Try to anticipate failures via monitoring Software must help to handle failures checkpoints, process-migration, transactional computing Slide 34

35 ExaScale Challenges Applications Ever increasing levels of parallelism Thousands of nodes, hundreds of cores, dozens of registers Automatic paralellization vs. explicit exposure How large are coherency domains? How many languages do we have to learn? MPI + X most probably not sufficient 1 process / core makes orchestration of processes harder GPUs require explicit handling today (CUDA, OpenCL) What is the future paradigm MPI + X + Y? PGAS + X (+Y)? PGAS: UPC, Co-Array Fortran, X10, Chapel, Fortress,. Which applications are inherently scalable enough at all? Slide 35

36 Exascale Software Initiatives IESP: International Exascale Software Project Led by DOE (ANL/ORNL Beckmann/Dongarra) EESI: European Exascale Software Initiative (EDF France) FP7: ICT Objective HPC Platforms with Exascale Performance PRACE Second implementation phase: scaling applications FET Flagship Initiative Supercomputing (Technology beyond 2020 BSC, INRIA, JSC) G8: Interdisciplinary Program on Application Software towards Exascale Computing for Global Scale Issues Slide 36

37 ExaScale Innovation Center (EIC) Lab together with IBM Böblingen researchers Located in Jülich Collaborating with scientist from IBM Lab Yorktown Heights Energy efficiency Explore new cooling concepts on basis of QPACE Future I/O Investigate I/O concepts for BlueGene /Q Programming Models Tools and algorithms for the ExaScale Slide 37

38 ExaCluster Laboratory (ECL) Lab together with Intel Braunschweig and ParTec Cluster Competence Center researchers Located in Jülich Challenges in system management software Improve scalability of ParaStation Reliable computing on unreliable components Development of a Cluster of Accelerators Prototype Intel MIC architecture (Knights Ferry aka Larrabee) Innovative interconnect architectures (3D-Torus) Slide 38

39 Some developments Target: Arrive at ExaScale at the end of the decade Have to enter the road today Unclear which road(s) will lead to ExaScale Maybe there has to be a completely new road? Some considerations Proprietary vs. commodity designs Are CPU-designs at O(109) $ affordable for HPC? Are Clusters still capable? Do we need new ideas? Let's have a look at some interesting projects QPACE, BlueGene/Q, DEEP Slide 39

40 QPACE Name and Provenience QPACE: QCD PArallel computing on CEll Design of a massively parallel QCD prototype (with suitability for other applications in mind) Enhanced Cell BE processor Custom network processor (based on FPGA) Main development within the German special research focus SFB/TR 55 Hadron Physics led by University of Regensburg in cooperation with IBM. Two installations University of Wuppertal (3.x racks) JSC (1 rack + 3 racks owned by University of Regensburg) #1 - #3 in June 2010 Green500 list ( MFlops / W) Systems have to be in Top500 list Ranked by energy efficiency (#4 less than 500 MFlops / W) Slide 40

41 QPACE vs. RoadRunner Both based on Cell technology RoadRunner #1 in Top500 11/2008 First sustained Linpack PFlop/s system ever Accelerated node design Accelerator as Co-processor QPACE Special purpose system (QCD) Highly energy efficient Accelerator node design Network directly connected to accelerator CPU Slide 41

42 QPACE Network processor Fast I/O fabric 2 FlexIO links to CBE = 6 GB/s 6 10 GbE links to nw = 6 GB/s 1 GigE link for I/O Fast proprietary internal bus for high-speed links Serial interfaces and config / status registers attached to Device Control Register (DCR) Bus Slide 42

43 Jülich Installation Slide 43

44 BlueGene/Q IBM Sequoia Third generation BlueGene Projected for 2012 PowerPC processor w/ 16+1 cores 4-way SMT In-order design Thread-level speculation Transactional memory Integrated memory-controller 1 GB / core 32 compute-node / drawer Water-cooled 5-D torus optical fabric 32 drawers / rack Slide 44

45 BlueGene/Q IBM Sequoia Sequoia to be installed at LLNL 96 racks / 1.6 million cores / 1.6 PB 20 PFlops / 6 MW It's already there First prototype system in TOP500 No 1 in Green Mflops/W (QPACE: MFlops / W) JSC does some research on BlueGene/Q within EIC How to do the I/O at ExaScale Try to safe servers by attaching storage directly to I/O-nodes Slide 45

46 Accelerated Cluster vs. Cluster of Accelerators Cluster with Accelerators Each node has a classical host CPU Accompanied by one or more Accelerators Communication typically via main memory PCIe bus turns out to be a bottleneck Cluster of Accelerators Node consists of Accelerator directly connected to network Impossible with (most) current accelerators Accelerator requires host-cpu to boot Unable to directly talk to the network Accelerator not capable to run general purpose code (OS) See QPACE as a first example Slide 46

47 Some consideration on Scalability Only few application capable to scale to O(300k) cores Sparse matrix-vector codes Highly regular communication patterns Well suited for BG/P Most applications have more complex kernels Complicated communication patterns Less capable to exploit accelerators In fact: Highly scalable apps dominated by highly scalable kernels Less scalable apps dominated by less scalable kernels But there might be highly scalable kernels, too! How to improve their scalability? Slide 47

48 The ideal world CN Go for more capable accelerators (e.g. MIC) Attach all nodes to a low-latency fabric All nodes might act autonomously Acc Acc CN CN Acc Acc Acc Dynamical assignment of cluster-nodes and accelerators IB can be assumed as fast as PCIe besides latency Ability to off-load more complex (including parallel) kernels communication between CPU and Accelerator less frequently larger messages i.e. less sensitive to latency Slide 48

49 Proposal for a new Architecture DEEP CN CN CN InfiniBand BI BN BN BN BI BN BN BN BI BN BN BN Cluster Booster Slide 49

50 Conclusions Clusters dominate main-stream HPC in the last decade Surfing the commodity wave Proprietary systems at the highest end Accelerators will be required Surfing the next wave (gaming) Still unclear how to attach the network The road to ExaScale is unclear Are Clusters capable to reach this goal? Is the Cluster idea expandable for ExaScale? New ideas in HPC-Architectures might be required Slide 50

51 Conclusions ExaScale introduces new challenges Energy Resiliency Input / Output Applications There will be ExaScale systems Sooner or later Unclear How they look like How general purpose they will be How many applications are capable to make use of them Slide 51

52 Thank you Slide 52

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome

More information

Jülich Supercomputing Centre

Jülich Supercomputing Centre Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre Norbert Attig Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich (FZJ) Aug 26, 2009 DOAG Regionaltreffen NRW 2 Supercomputing at

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

I/O and Scheduling aspects in DEEP-EST

I/O and Scheduling aspects in DEEP-EST I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE , NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC

More information

NVIDIA Application Lab at Jülich

NVIDIA Application Lab at Jülich Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800

More information

I/O Monitoring at JSC, SIONlib & Resiliency

I/O Monitoring at JSC, SIONlib & Resiliency Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Porting Scientific Applications to OpenPOWER

Porting Scientific Applications to OpenPOWER Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel

More information

Systems Architectures towards Exascale

Systems Architectures towards Exascale Systems Architectures towards Exascale D. Pleiter German-Indian Workshop on HPC Architectures and Applications Pune 29 November 2016 Outline Introduction Exascale computing Technology trends Architectures

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Vectorisation and Portable Programming using OpenCL

Vectorisation and Portable Programming using OpenCL Vectorisation and Portable Programming using OpenCL Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre (JSC) Andreas Beckmann, Ilya Zhukov, Willi Homberg, JSC Wolfram Schenck, FH Bielefeld

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

Roadmapping of HPC interconnects

Roadmapping of HPC interconnects Roadmapping of HPC interconnects MIT Microphotonics Center, Fall Meeting Nov. 21, 2008 Alan Benner, bennera@us.ibm.com Outline Top500 Systems, Nov. 2008 - Review of most recent list & implications on interconnect

More information

QDP++/Chroma on IBM PowerXCell 8i Processor

QDP++/Chroma on IBM PowerXCell 8i Processor QDP++/Chroma on IBM PowerXCell 8i Processor Frank Winter (QCDSF Collaboration) frank.winter@desy.de University Regensburg NIC, DESY-Zeuthen STRONGnet 2010 Conference Hadron Physics in Lattice QCD Paphos,

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

The DEEP (and DEEP-ER) projects

The DEEP (and DEEP-ER) projects The DEEP (and DEEP-ER) projects Estela Suarez - Jülich Supercomputing Centre BDEC for Europe Workshop Barcelona, 28.01.2015 The research leading to these results has received funding from the European

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux,

More information

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice Providing the Best Return on Investment by Delivering the Highest System Efficiency and Utilization Top500 Supercomputers June

More information

Lecture 20: Distributed Memory Parallelism. William Gropp

Lecture 20: Distributed Memory Parallelism. William Gropp Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order

More information

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,

More information

Experts in Application Acceleration Synective Labs AB

Experts in Application Acceleration Synective Labs AB Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg

More information

Prototyping in PRACE PRACE Energy to Solution prototype at LRZ

Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Torsten Wilde 1IP-WP9 co-lead and 2IP-WP11 lead (GSC-LRZ) PRACE Industy Seminar, Bologna, April 16, 2012 Leibniz Supercomputing Center 2 Outline

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader

Prototypes Systems for PRACE. François Robin, GENCI, WP7 leader Prototypes Systems for PRACE François Robin, GENCI, WP7 leader Outline Motivation Summary of the selection process Description of the set of prototypes selected by the Management Board Conclusions 2 Outline

More information

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2 Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era 11/16/2011 Many-Core Computing 2 Gene M. Amdahl, Validity of the Single-Processor Approach to Achieving

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Thread and Data parallelism in CPUs - will GPUs become obsolete?

Thread and Data parallelism in CPUs - will GPUs become obsolete? Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation IBM HPC DIRECTIONS Dr Don Grice ECMWF Workshop November, 2008 IBM HPC Directions Agenda What Technology Trends Mean to Applications Critical Issues for getting beyond a PF Overview of the Roadrunner Project

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Von Antreibern und Beschleunigern des HPC

Von Antreibern und Beschleunigern des HPC Mitglied der Helmholtz-Gemeinschaft Von Antreibern und Beschleunigern des HPC D. Pleiter Jülich 16 December 2014 Ein Dementi vorweg [c't, Nr. 25/2014, 15.11.2014] Ja: Das FZJ ist seit März Mitglieder der

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Parallel Computer Architecture - Basics -

Parallel Computer Architecture - Basics - Parallel Computer Architecture - Basics - Christian Terboven 19.03.2012 / Aachen, Germany Stand: 15.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Processor

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Directions in HPC Technology

Directions in HPC Technology Directions in HPC Technology PRACE evaluates Technologies for Multi-Petaflop/s Systems This should lead to integration of 3 5 Tier-0 world-class systems in Europe from 2010 on. It implies: New hardware

More information

European energy efficient supercomputer project

European energy efficient supercomputer project http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Trends in HPC Architectures and Parallel

Trends in HPC Architectures and Parallel Trends in HPC Architectures and Parallel Programmming Giovanni Erbacci - g.erbacci@cineca.it Supercomputing, Applications & Innovation Department - CINECA Agenda - Computational Sciences - Trends in Parallel

More information

PRACE prototypes. ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF

PRACE prototypes. ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF PRACE prototypes ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF jean-philippe.nomine@cea.fr Credits and acknowledgements: FZJ, CEA, NCF/SARA, HLRS, BSC, CSC, CSCS F. Robin (PRACE WP7 Leader) 2 PRACE

More information

HPC projects. Grischa Bolls

HPC projects. Grischa Bolls HPC projects Grischa Bolls Outline Why projects? 7th Framework Programme Infrastructure stack IDataCool, CoolMuc Mont-Blanc Poject Deep Project Exa2Green Project 2 Why projects? Pave the way for exascale

More information

HPC Technology Trends

HPC Technology Trends HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations

More information

Recent Developments in Supercomputing

Recent Developments in Supercomputing John von Neumann Institute for Computing Recent Developments in Supercomputing Th. Lippert published in NIC Symposium 2008, G. Münster, D. Wolf, M. Kremer (Editors), John von Neumann Institute for Computing,

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

The Road from Peta to ExaFlop

The Road from Peta to ExaFlop The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units

More information

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29 Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing

More information

Exascale: challenges and opportunities in a power constrained world

Exascale: challenges and opportunities in a power constrained world Exascale: challenges and opportunities in a power constrained world Carlo Cavazzoni c.cavazzoni@cineca.it SuperComputing Applications and Innovation Department CINECA CINECA non profit Consortium, made

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director CAS 2K13 Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1 personal note 2 Complete solutions for Extreme Computing b ubullx ssupercomputer u p e r c o p u t e r suite s u e Production ready

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

Stockholm Brain Institute Blue Gene/L

Stockholm Brain Institute Blue Gene/L Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

High Performance Computing (HPC) Introduction

High Performance Computing (HPC) Introduction High Performance Computing (HPC) Introduction Ontario Summer School on High Performance Computing Scott Northrup SciNet HPC Consortium Compute Canada June 25th, 2012 Outline 1 HPC Overview 2 Parallel Computing

More information

Exascale: Parallelism gone wild!

Exascale: Parallelism gone wild! IPDPS TCPP meeting, April 2010 Exascale: Parallelism gone wild! Craig Stunkel, Outline Why are we talking about Exascale? Why will it be fundamentally different? How will we attack the challenges? In particular,

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection

More information

Parallel Programming on Ranger and Stampede

Parallel Programming on Ranger and Stampede Parallel Programming on Ranger and Stampede Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition December 11, 2012 What is Stampede? NSF-funded XSEDE

More information

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

IBM Power Advanced Compute (AC) AC922 Server

IBM Power Advanced Compute (AC) AC922 Server IBM Power Advanced Compute (AC) AC922 Server The Best Server for Enterprise AI Highlights IBM Power Systems Accelerated Compute (AC922) server is an acceleration superhighway to enterprise- class AI. A

More information

Hybrid Architectures Why Should I Bother?

Hybrid Architectures Why Should I Bother? Hybrid Architectures Why Should I Bother? CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday

More information

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

On the Capability and Achievable Performance of FPGAs for HPC Applications "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

John Hengeveld Director of Marketing, HPC Evangelist

John Hengeveld Director of Marketing, HPC Evangelist MIC, Intel and Rearchitecting for Exascale John Hengeveld Director of Marketing, HPC Evangelist Intel Data Center Group Dr. Jean-Laurent Philippe, PhD Technical Sales Manager & Exascale Technical Lead

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Trends in Scientific Discovery Engines

Trends in Scientific Discovery Engines Trends in Scientific Discovery Engines Mark Stalzer Center for Advanced Computing Research California Institute of Technology stalzer@caltech.edu www.cacr.caltech.edu AstroInformatics, September 2012 Redmond

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

HPC Technology Update Challenges or Chances?

HPC Technology Update Challenges or Chances? HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter

More information

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu Filip Staněk Seminář gridového počítání 2011, MetaCentrum, Brno, 7. 11. 2011 Introduction I Project objectives: to establish a centre

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems

More information