The BlueGene/L Supercomputer: Delivering Large Scale Parallelism

Size: px
Start display at page:

Download "The BlueGene/L Supercomputer: Delivering Large Scale Parallelism"

Transcription

1 : Deliering Large Scale Parallelism José E. Moreira IBM T. J. Watson Research Center August 2004 BlueGene/L

2 Outline BlueGene/L high-leel oeriew BlueGene/L system architecture and technology oeriew BlueGene/L software architecture BlueGene/L preliminary performance Conclusions 2

3 Outline BlueGene/L high-leel oeriew BlueGene/L system architecture and technology oeriew BlueGene/L software architecture BlueGene/L preliminary performance Conclusions 3

4 The BlueGene/L project Executie summary A research partnership with Lawrence Liermore National Laboratory deliery of a 65,536-node machine in 2004/ Tflops peak, 32 Tbytes memory capacity 4096-node (8192-processor) prototype with pass 1 chips in operation Full system software stack and applications running Support for our Blue Gene science program in inestigating the mechanisms of biologically important processes, particularly protein folding node system being planned for IBM Watson IBM working with collaborators to better understand application behaior on BlueGene/L and broaden the scope of the machine ASTRON in the Netherlands is getting a 6144-node machine Argonne National Laboratory is getting a 1024-node machine (more later) 4

5 BlueGene/L philosophy Some applications display ery high leels of parallelism: they can be executed efficiently on tens of thousands of processors if: low-latency, high-bandwidth interconnect is present machine is reliable enough good compilers and programming models are aailable machine is manageable (simple to build and operate) Vastly improed price-performance for a class of applications by choosing simple low-power building block - highest possible singlethreaded performance is not releant, aggregate is! Scalable architecture and appropriate package can lead to a high density, low power, massiely parallel system BlueGene/L = cellular architecture + aggressie packaging + scalable software 5

6 Compute density 2 Pentium 4/1U 42 1U/rack 84 processors/rack 2 Pentium 4/blade (7U) 14 blade/chassis 6 chassis/frame 168 processors/rack 2 dual CPU/compute card 16 compute cards/node card 16 node cards/midplane 2 midplanes/rack 2048 processors/rack 6

7 BlueGene/L cellular architecture Start with a simple building block (or cell) that can be replicated ad infinitum as necessary aggregate performance is important Cell should hae low power/high density characteristics, and should also be cheap to build and easy to interconnect Build a homogeneous collection of these cells, each with its own operating system image All cells hae the same computational and communications capabilities (interchangeable from OS or application iew) Integrated connection hardware proides a straightforward path to scalable systems with thousands/millions of cells 7

8 BlueGene/L hierarchical system software Organize the cells hierarchically form processing sets (psets) consisting of a collection of compute nodes under control of a master node From a system perspectie, BlueGene/L will look like a 1024-way cluster of independent processing sets - a manageable size In BlueGene/L, each processing set contains 64 compute nodes which can be used exclusiely for running user application processes Each processing set is under control of one Linux image that executes on a 65 th node the I/O node From an application perspectie, a flat iew is maintained an application consists of a set of communicating compute processes, each running on a dedicated compute node 8

9 Outline BlueGene/L high-leel oeriew BlueGene/L system architecture and technology oeriew BlueGene/L software architecture BlueGene/L preliminary performance Conclusions 9

10 BlueGene/L fundamentals A large number of nodes (65,536) Low-power (20W) nodes for density High floating-point performance System-on-a-chip technology Nodes interconnected as 64x32x32 threedimensional torus Easy to build large systems, as each node connects only to six nearest neighbors full routing in hardware Bisection bandwidth per node is proportional to n 2 /n 3 Auxiliary networks for I/O and global operations Applications consist of multiple processes with message passing Strictly one process/node Minimum OS inolement and oerhead 10

11 BlueGene/L interconnection networks 3 Dimensional Torus Interconnects all compute nodes (65,536) Virtual cut-through hardware routing 1.4Gb/s on all 12 node links (2.1 GB/s per node) Communications backbone for computations 350/700 GB/s bisection bandwidth 11

12 BlueGene/L fundamentals (continued) Machine should be dedicated to execution of applications, not system management Aoid asynchronous eents (e.g., daemons, interrupts) Aoid complex operations on compute nodes The I/O node an offload engine System management functions are performed in a (N+1)th node I/O (and other complex operations) are shipped from compute node to I/O node for execution Number of I/O nodes adjustable to needs (N=64 for BG/L) This separation between application and system functions allows compute nodes to focus on application execution Communication to the I/O nodes must be through a separate tree interconnection network to aoid polluting the torus 12

13 BlueGene/L interconnection networks 3 Dimensional Torus Interconnects all compute nodes (65,536) Virtual cut-through hardware routing 1.4Gb/s on all 12 node links (2.1 GB/s per node) Communications backbone for computations 350/700 GB/s bisection bandwidth Global Tree One-to-all broadcast functionality Reduction operations functionality 2.8 Gb/s of bandwidth per link Latency of tree traersal in the order of 2 µs Interconnects all compute and I/O nodes (1024) 13

14 BlueGene/L interconnection networks 3 Dimensional Torus Interconnects all compute nodes (65,536) Virtual cut-through hardware routing 1.4Gb/s on all 12 node links (2.1 GB/s per node) Communications backbone for computations 350/700 GB/s bisection bandwidth Global Tree One-to-all broadcast functionality Reduction operations functionality 2.8 Gb/s of bandwidth per link Latency of tree traersal in the order of 2 µs Interconnects all compute and I/O nodes (1024) Gigabit Ethernet Incorporated into eery node ASIC Actie in the I/O nodes (1:64) All external comm. (file I/O, control, user interaction, etc.) 14

15 BlueGene/L fundamentals (continued) Machine monitoring and control should also be offloaded Machine boot, health monitoring Performance monitoring Concept of a serice node Separate nodes dedicated to machine control and monitoring Non-architected from a user perspectie Control network A separate network that does not interfere with application or I/O traffic Secure network, since operations are critical By offloading serices to other nodes, we let the compute nodes focus solely on application execution without interference BG/L is the extreme opposite of time sharing: eerything is space shared 15

16 BlueGene/L control system Control and monitoring are performed by a set of processes executing on the serice nodes Non-architected network (Ethernet + JTAG) supports non-intrusie monitoring and control Core Monitoring and Control System Controls machine initialization and configuration Performs machine monitoring Two-way communication with node operating system Integration with database for system repository 16

17 BlueGene/L system architecture oeriew tree Pset 0 Console Front-end Nodes File Serers I/O Node 0 Linux C-Node 0 C-Node 63 DB2 Serice Node MMCS Functional Functional Gigabit Gigabit Ethernet Ethernet ciod I/O Node 1023 CNK torus C-Node 0 CNK C-Node 63 I 2 C Linux Scheduler Control Control Gigabit Gigabit Ethernet Ethernet ciod CNK CNK IDo chip JTAG Pset

18 BlueGene/L compute chip PLB (4:1) 32k/32k L1 440 CPU Double FPU 32k/32k L1 440 CPU Prefetch Buffers Prefetch Buffers Multiported SRAM Buffer Locks Shared L3 directory for EDRAM Includes ECC ECC 4MB EDRAM L3 Cache Multibank Double FPU 256 Ethernet Gbit JTAG Access Link buffers and Routing DDR Control with ECC Gbit Ethernet 6 Bi-directional 1.4 Gb/s links Gb/s tree 144 bit wide DDR (256MB) 5.6 GB/s 18

19 BlueGene/L 19

20 Complete BlueGene/L system at LLNL 20

21 BlueGene/L system 21

22 Outline BlueGene/L high-leel oeriew BlueGene/L system architecture and technology oeriew BlueGene/L software architecture BlueGene/L preliminary performance Conclusions 22

23 Familiar software enironment Fortran, C, C++ with MPI Full language support Automatic SIMD FPU exploitation Linux deelopment enironment User interacts with system through FE nodes running Linux compilation, job submission, debugging Compute Node Kernel proides look and feel of a Linux enironment POSIX system calls (with restrictions) Tools support for debuggers, hardware performance monitors, trace based isualization 23

24 Scalability through simplicity Performance Strictly space sharing - one job (user) per electrical partition of machine, one process per compute node Dedicated processor for each application leel thread Guaranteed, deterministic execution Physical memory directly mapped to application address space no TLB misses, page faults Efficient, user mode access to communication networks No protection necessary because of strict space sharing Multi-tier hierarchical organization system serices (I/O, process control) offloaded to IO nodes, control and monitoring offloaded to serice node No daemons interfering with application execution System manageable as a cluster of IO nodes Reliability, aailability, sericeability Reduce software errors - simplicity of software, extensie run time checking option Ability to detect, isolate, possibly predict failures 24

25 BlueGene/L software architecture User applications execute exclusiely on the compute nodes and see only the application olume as exposed by the user-leel APIs The outside world interacts only with the I/O nodes and processing sets (I/O node + compute nodes) they represent, through the operational surface functionally, the machine behaes as a cluster of I/O nodes Internally, the machine is controlled through the serice nodes in the control surface goal is to hide this surface as much as possible 25

26 BlueGene/L software architecture rationale We iew the system as a cluster of 1,024 I/O nodes, thereby reducing a 65,536-node machine to a manageable size From a system perspectie, user processes are controlled from the I/O node: Process management (start, signaling, kill) Process debugging Authentication, authorization System management daemon Traditional LINUX operating system User processes actually execute on compute nodes Simple single-process (two threads) kernel on compute node Also possible two single-threaded processes per compute node One processor/thread proides fast, predictable execution User process extends into I/O node for complex operations Application model is a collection of priate-memory processes communicating through messages This is an MPI machine! 26

27 Software stack in BlueGene/L compute node CNK controls all access to hardware, and enables bypass for application use User-space libraries and applications can directly access torus and tree through bypass As a policy, user-space code should not directly touch hardware, but there is no enforcement of that policy Application code can use both processors in a compute node Application code User-space libraries CNK Bypass BG/L ASIC 27

28 File I/O in BlueGene/L Application ciod libc CNK BG/L ASIC Linux NFS IP BG/L ASIC File serer Tree Ethernet 28

29 File I/O in BlueGene/L Application fscanf ciod libc CNK BG/L ASIC Linux NFS IP BG/L ASIC File serer Tree Ethernet 29

30 File I/O in BlueGene/L Application fscanf ciod read libc CNK BG/L ASIC Linux NFS IP BG/L ASIC File serer Tree Ethernet 30

31 File I/O in BlueGene/L Application fscanf ciod read libc read Linux CNK NFS Tree packets BG/L ASIC Tree packets IP BG/L ASIC File serer Tree Ethernet 31

32 File I/O in BlueGene/L Application fscanf ciod read libc read read Linux CNK NFS Tree packets BG/L ASIC Tree packets IP BG/L ASIC File serer Tree Ethernet 32

33 File I/O in BlueGene/L Application fscanf ciod read libc read read Linux CNK NFS Tree packets BG/L ASIC Tree packets IP BG/L ASIC File serer Tree Ethernet 33

34 File I/O in BlueGene/L Application fscanf ciod read libc read read data Linux CNK NFS Tree packets BG/L ASIC Tree packets IP BG/L ASIC File serer Tree Ethernet 34

35 Modes of operation of BlueGene/L compute node Coprocessor mode: CPU 0 executes main thread of user application while CPU 1 is a an off-load engine (asymmetric mode) CPU 1 runs much of message-passing protocol Computations can also be off-loaded with an a asynchronous coroutine model (co_start / co_join) Preferred mode of operation for communication-intensie and memory bandwidth-intensie codes Need careful sequence of cache line flush, inalidate, and copy operations to deal with lack of L1 cache coherence in hardware handled in MPI library for communications Virtual node mode: CPU0 and CPU1 each behae as an independent irtual node, with half the memory of physical node (symmetric mode) Two MPI processes on each physical node, one bound to each processor Priate memory semantics lack of L1 coherence not a problem Intra-node (physical) communication is performed through non-l1 coherent memory region 35

36 The coroutine model main processor int main(argc, arg) { coprocessor hdl = co_start(foo) int foo(x) { } rc = co_join(hdl) } return y; 36

37 The BlueGene/L MPICH2 organization Message passing Process management MPI pt2pt datatype topo collecties PMI Abstract Deice Interface CH3 socket MM bgltorus Message Layer torus tree GI bgltorus mpd uniprocessor simple Torus Packet Layer Tree Packet Layer GI Deice CIO Protocol 37

38 Outline BlueGene/L high-leel oeriew BlueGene/L system architecture and technology oeriew BlueGene/L software architecture BlueGene/L preliminary performance Conclusions 38

39 BlueGene/L status at a glance Status as of today: 4,096-compute node system at IBM Rochester, MN 1 st pass chips, good for software deelopment and testing, running at 500 MHz It is the #4 in the TOP500 list of June 2004 (After Earth Simulator, ASCI Thunder, and ASCI Q) 2048-node system at IBM T. J. Watson, NY 2 nd pass chips, production quality, running at 700 MHz, is #8 in the TOP500 What does that mean? Each node is dual processor, so BlueGene/L already has as many processors as ASCI Q and ASCI White 16 TF of peak (@500 MHz) is faster than ASCI White All this on 40 sq. ft. of floor space! 4 racks as opposed to 100+ for White and Q! LLNL machine will be 16 times bigger and 40% faster! 39

40 ASCI White deliered by IBM to LLNL in

41 BlueGene/L in pictures (4-rack system) 41

42 Initial benchmark results Multi-node (MPI) benchmarks NAS benchmarks ASCI-class benchmarks and applications Comparison with other systems NOTE: Some results are for 500 MHz, others for 700 MHz. Some results are for weak scaling, others for strong scaling. Some results were measured by IBM, others by our collaborators If you are not confused now, you will be by the end of this talk! 42

43 Noise measurements 43

44 Measured MPI Send Bandwidth and Latency Bandwidth 700 MHz neighbor 2 neighbors 3 neighbors 4 neighbors 5 neighbors 6 neighbors M essage size (bytes) MHz = * Manhattan distance * Midplane hops s 44

45 Variation of bandwidth (500 MHz) 45

46 LINPACK summary 500 MHz) LINPACK Performance Gflops Number of nodes Single-node 80% of peak 512-node 70% of peak 1435 Gflop/s #73 on No 2003 TOP500 list Same leel of performance as cluster of Xeon 3.06 GHz with 800 processors 4096-node Achieed Gflop/s Continue to sustain better than 70% of peak (improements in code) Achieed #4 in June 2004 TOP MHz system achieed 8650 Gflop/s and is #8 in TOP500 Has been a great test case/drier for the hardware and system software

47 NAS Parallel Benchmarks Class C on 256 nodes Mop/s/node BGL T3E BT CG EP FT IS LU MG SP All NAS Parallel Benchmarks run successfully on 256 nodes (and many other configurations) Used class C (largest class with published results) No tuning / code changes Compared 500 MHz BG/L and 450 MHz Cray T3E All BG/L benchmarks were compiled with GNU and XL compilers Report best result (GNU for IS) BG/L is a factor of two/three faster on fie benchmarks (BT, FT, LU, MG, and SP), a bit slower on one (EP) 47

48 SPPM on fixed machine size (BG/L 700 MHz) SPPM Computational Performance (32 nodes of BG/L, 32 processors p655) GHz p655 BG/L VNM BG/L COP 6000 Rate (points/sec/node) Grid Dimension 48

49 SPPM on fixed grid size (BG/L 700 MHz) SPPM Scaling (128**3, real*8) P GHz BG/L VNM BG/L COP Rate (points/sec/node) Nodes BG/L, processors p655 49

50 SAGE on fixed machine size (BG/L 700 MHz) SAGE Computational Performance (8 nodes BG/L, processors p655 / timing_h) P GHz BG/L VNM BG/L COP Rate(cells/node/sec) Cells per node 50

51 Effect of mapping on SAGE SAGE Scaling (timing_h) Processing Rate(cells/sec/cpu) Random Map Default Map Heuristic Map Processors 51

52 Sweep3D results ASCI Q s BG/L (700 MHz) Aggregate rate (cell/sec) BG/L VNM ASCI Q BG/L COP # processors (ASCI Q), # physical nodes (BG/L) 52

53 Miranda results (by LLNL) 100 Miranda scaling on BG/L and MCR (Charles Crab) Running time (s) 10 1 BG/L 500 MCR Processor count 53

54 ParaDis on 700 MHz BG/L and MCR 54

55 HPC Challenge: Random Access Updates GUP/s (more is better) Cray X1/ORNL (64) HP Alpha/PSC (128) HP Itanium2/OSC (128) BG/L/YKT (64) 0 MPI 55

56 HPC Challenge: Latency (usec) microseconds (less is better) Rand Ring Cray X1, ORNL (64) HP Alpha, PSC (128) HP Itanium2, OSC (128) BG/L, YKT (64) 56

57 Outline BlueGene/L high-leel oeriew BlueGene/L system architecture and technology oeriew BlueGene/L software architecture BlueGene/L preliminary performance Conclusions 57

58 Conclusions BlueGene/L represents a new leel of performance scalability and density for scientific computing Third-generation machine architecture Leerages system-on-a-chip technology and nearest-neighbor interconnect Very low latency, high-bandwidth communication, with small n 1/2 (problem size for achieing half of maximum performance) BlueGene/L system software stack with Linux-like personality for applications Custom solution (CNK) on compute nodes for highest performance Linux solution on I/O nodes for flexibility and functionality MPI is the default programming model, others are being inestigated BlueGene/L is testing software approaches for management/operation of ery large scale machines Hierarchical organization for management Flat organization for programming Mixed conentional/special-purpose operating systems Encouraging performance results! Many challenges ahead, particularly in scalability and reliability, as we grow to bigger system sizes 58

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Stockholm Brain Institute Blue Gene/L

Stockholm Brain Institute Blue Gene/L Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall

More information

Porting Applications to Blue Gene/P

Porting Applications to Blue Gene/P Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P

More information

Unfolding Blue Gene. Workshop on High-Performance Computing ETH Zürich, Switzerland, September

Unfolding Blue Gene. Workshop on High-Performance Computing ETH Zürich, Switzerland, September Unfolding Blue Gene Workshop on High-Performance Computing ETH Zürich, Switzerland, September 4-5 2006 Michael Hennecke HPC Systems Architect hennecke @ de.ibm.com 2006 IBM Corporation Prologue 1 Key Criteria

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Resource allocation and utilization in the Blue Gene/L supercomputer

Resource allocation and utilization in the Blue Gene/L supercomputer Resource allocation and utilization in the Blue Gene/L supercomputer Tamar Domany, Y Aridor, O Goldshmidt, Y Kliteynik, EShmueli, U Silbershtein IBM Labs in Haifa Agenda Blue Gene/L Background Blue Gene/L

More information

Upgrading XL Fortran Compilers

Upgrading XL Fortran Compilers Upgrading XL Fortran Compilers Oeriew Upgrading to the latest IBM XL Fortran compilers makes good business sense. Upgrading puts new capabilities into the hands of your programmers making them and your

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia

Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia Marco Briscolini Deep Computing Sales Marco_briscolini@it.ibm.com IBM Pathways to Deep Computing Single Integrated

More information

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied

More information

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

Blue Gene/Q A system overview

Blue Gene/Q A system overview Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q A system overview M. Stephan Outline Blue Gene/Q hardware design Processor Network I/O node Jülich Blue Gene/Q configurations (JUQUEEN) Blue Gene/Q software

More information

PRIMEHPC FX10: Advanced Software

PRIMEHPC FX10: Advanced Software PRIMEHPC FX10: Advanced Software Koh Hotta Fujitsu Limited System Software supports --- Stable/Robust & Low Overhead Execution of Large Scale Programs Operating System File System Program Development for

More information

Technology Trends Presentation For Power Symposium

Technology Trends Presentation For Power Symposium Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Implementing MPI on the BlueGene/L Supercomputer

Implementing MPI on the BlueGene/L Supercomputer Implementing MPI on the BlueGene/L Supercomputer George Almási, Charles Archer, José G. Castaños, C. Chris Erway, Philip Heidelberger, Xavier Martorell, José E. Moreira, Kurt Pinnow, Joe Ratterman, Nils

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

Implementing Optimized Collective Communication Routines on the IBM BlueGene/L Supercomputer

Implementing Optimized Collective Communication Routines on the IBM BlueGene/L Supercomputer 1. Introduction Implementing Optimized Collective Communication Routines on the IBM BlueGene/L Supercomputer Sam Miller samm@scl.ameslab.gov Computer Science 425 Prof. Ricky Kendall Iowa State University

More information

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory

More information

Architecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY

Architecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY Architecture of the IBM Blue Gene Supercomputer Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY President Obama Honors IBM's Blue Gene Supercomputer With National Medal

More information

A Global Operating System for HPC Clusters

A Global Operating System for HPC Clusters A Global Operating System Emiliano Betti 1 Marco Cesati 1 Roberto Gioiosa 2 Francesco Piermaria 1 1 System Programming Research Group, University of Rome Tor Vergata 2 BlueGene Software Division, IBM TJ

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Lecture 20: Distributed Memory Parallelism. William Gropp

Lecture 20: Distributed Memory Parallelism. William Gropp Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order

More information

Technical Computing Suite supporting the hybrid system

Technical Computing Suite supporting the hybrid system Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

xsim The Extreme-Scale Simulator

xsim The Extreme-Scale Simulator www.bsc.es xsim The Extreme-Scale Simulator Janko Strassburg Severo Ochoa Seminar @ BSC, 28 Feb 2014 Motivation Future exascale systems are predicted to have hundreds of thousands of nodes, thousands of

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Computer Comparisons Using HPCC. Nathan Wichmann Benchmark Engineer

Computer Comparisons Using HPCC. Nathan Wichmann Benchmark Engineer Computer Comparisons Using HPCC Nathan Wichmann Benchmark Engineer Outline Comparisons using HPCC HPCC test used Methods used to compare machines using HPCC Normalize scores Weighted averages Comparing

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

HPCS HPCchallenge Benchmark Suite

HPCS HPCchallenge Benchmark Suite HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

IBM Tivoli Storage Manager for Windows Version 7.1. Installation Guide

IBM Tivoli Storage Manager for Windows Version 7.1. Installation Guide IBM Tioli Storage Manager for Windows Version 7.1 Installation Guide IBM Tioli Storage Manager for Windows Version 7.1 Installation Guide Note: Before using this information and the product it supports,

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

IBM. Systems management Logical partitions. System i. Version 6 Release 1

IBM. Systems management Logical partitions. System i. Version 6 Release 1 IBM System i Systems management Logical partitions Version 6 Release 1 IBM System i Systems management Logical partitions Version 6 Release 1 Note Before using this information and the product it supports,

More information

RapidIO.org Update. Mar RapidIO.org 1

RapidIO.org Update. Mar RapidIO.org 1 RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org

More information

Live Partition Mobility ESCALA REFERENCE 86 A1 85FA 01

Live Partition Mobility ESCALA REFERENCE 86 A1 85FA 01 Lie Partition Mobility ESCALA REFERENCE 86 A1 85FA 01 ESCALA Lie Partition Mobility Hardware May 2009 BULL CEDOC 357 AVENUE PATTON B.P.20845 49008 ANGERS CEDE 01 FRANCE REFERENCE 86 A1 85FA 01 The following

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

IBM Tivoli Monitoring: AIX Premium Agent Version User's Guide SA

IBM Tivoli Monitoring: AIX Premium Agent Version User's Guide SA Tioli IBM Tioli Monitoring: AIX Premium Agent Version 6.2.2.1 User's Guide SA23-2237-06 Tioli IBM Tioli Monitoring: AIX Premium Agent Version 6.2.2.1 User's Guide SA23-2237-06 Note Before using this information

More information

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

IBM Cell Processor. Gilbert Hendry Mark Kretschmann IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

IBM Tivoli Storage Manager Version Optimizing Performance IBM

IBM Tivoli Storage Manager Version Optimizing Performance IBM IBM Tioli Storage Manager Version 7.1.6 Optimizing Performance IBM IBM Tioli Storage Manager Version 7.1.6 Optimizing Performance IBM Note: Before you use this information and the product it supports,

More information

QCD Performance on Blue Gene/L

QCD Performance on Blue Gene/L QCD Performance on Blue Gene/L Experiences with the Blue Gene/L in Jülich 18.11.06 S.Krieg NIC/ZAM 1 Blue Gene at NIC/ZAM in Jülich Overview: BGL System Compute Chip Double Hummer Network/ MPI Issues Dirac

More information

Best Practices for Setting BIOS Parameters for Performance

Best Practices for Setting BIOS Parameters for Performance White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page

More information

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP

Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Performance Study of the MPI and MPI-CH Communication Libraries on the IBM SP Ewa Deelman and Rajive Bagrodia UCLA Computer Science Department deelman@cs.ucla.edu, rajive@cs.ucla.edu http://pcl.cs.ucla.edu

More information

White Paper. Technical Advances in the SGI. UV Architecture

White Paper. Technical Advances in the SGI. UV Architecture White Paper Technical Advances in the SGI UV Architecture TABLE OF CONTENTS 1. Introduction 1 2. The SGI UV Architecture 2 2.1. SGI UV Compute Blade 3 2.1.1. UV_Hub ASIC Functionality 4 2.1.1.1. Global

More information

HPCC Results. Nathan Wichmann Benchmark Engineer

HPCC Results. Nathan Wichmann Benchmark Engineer HPCC Results Nathan Wichmann Benchmark Engineer Outline What is HPCC? Results Comparing current machines Conclusions May 04 2 HPCChallenge Project Goals To examine the performance of HPC architectures

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi

More information

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein Parallel & Cluster Computing cs 6260 professor: elise de doncker by: lina hussein 1 Topics Covered : Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster

More information

mos: An Architecture for Extreme Scale Operating Systems

mos: An Architecture for Extreme Scale Operating Systems mos: An Architecture for Extreme Scale Operating Systems Robert W. Wisniewski, Todd Inglett, Pardo Keppel, Ravi Murty, Rolf Riesen Presented by: Robert W. Wisniewski Chief Software Architect Extreme Scale

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Global Headquarters: 5 Speen Street Framingham, MA USA P F Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com WHITE PAPER A New Strategic Approach To HPC: IBM's Blue Gene Sponsored by: IBM Christopher G. Willard,

More information

Topology Awareness in the Tofu Interconnect Series

Topology Awareness in the Tofu Interconnect Series Topology Awareness in the Tofu Interconnect Series Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 23rd, 2016, ExaComm2016 Workshop 0 Introduction Networks

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Clusters of SMP s. Sean Peisert

Clusters of SMP s. Sean Peisert Clusters of SMP s Sean Peisert What s Being Discussed Today SMP s Cluters of SMP s Programming Models/Languages Relevance to Commodity Computing Relevance to Supercomputing SMP s Symmetric Multiprocessors

More information

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.

More information

Delivering HPC Performance at Scale

Delivering HPC Performance at Scale Delivering HPC Performance at Scale October 2011 Joseph Yaworski QLogic Director HPC Product Marketing Office: 610-233-4854 Joseph.Yaworski@QLogic.com Agenda QLogic Overview TrueScale Performance Design

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and

Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet Swamy N. Kandadai and Xinghong He swamy@us.ibm.com and xinghong@us.ibm.com ABSTRACT: We compare the performance of several applications

More information

EEC 581 Computer Architecture. Lec 10 Multiprocessors (4.3 & 4.4)

EEC 581 Computer Architecture. Lec 10 Multiprocessors (4.3 & 4.4) EEC 581 Computer Architecture Lec 10 Multiprocessors (4.3 & 4.4) Chansu Yu Electrical and Computer Engineering Cleveland State University Acknowledgement Part of class notes are from David Patterson Electrical

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

Cray RS Programming Environment

Cray RS Programming Environment Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,

More information

Solutions for SAP Systems Using IBM DB2 for IBM z/os

Solutions for SAP Systems Using IBM DB2 for IBM z/os Rocket Mainstar Solutions for SAP Systems Using IBM DB2 for IBM z/os white paper Rocket Mainstar Solutions for SAP Systems Using IBM DB2 for IBM z/os A White Paper by Rocket Software Version 1.4 Reised

More information

Scalable Multiprocessors

Scalable Multiprocessors Topics Scalable Multiprocessors Scaling issues Supporting programming models Network interface Interconnection network Considerations in Bluegene/L design 2 Limited Scaling of a Bus Comparing with a LAN

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper A Low Latency Solution Stack for High Frequency Trading White Paper High-Frequency Trading High-frequency trading has gained a strong foothold in financial markets, driven by several factors including

More information

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages

More information