Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007

Size: px
Start display at page:

Download "Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007"

Transcription

1 Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007

2 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some early results References

3 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Offering Schematic Blue Gene/P Blue Gene/P continues Blue Gene s leadership performance in a spacesaving, power-efficient package for the most demanding and scalable high-performance computing applications Rack 32 Node Cards 1024 chips, 4096 procs Cabled 8x8x16 System 1 to 72 or more Racks Compute Card 1 chip, 20 DRAMs Chip 4 processors 13.6 GF/s 8 MB EDRAM Node Card (32 chips 4x4x2) 32 compute, 0-2 IO cards 13.6 GF/s 2.0 GB DDR Supports 4-way SMP 435 GF/s 64 GB 14 TF/s 2 TB Front End Node / Service Node System p Servers Linux SLES10 1 PF/s TB + HPC SW: Compilers GPFS ESSL LoadLeveler 12 IBM System Blue Gene /P Solution 2007 IBM Corporation

4 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Processor Memory Attribute Details IBM PowerPC MHz; four per node 2 GB SDRAM-DDR per node (Model ) Benefits Low power allows dense packaging; better processor-memory balance Wider application reach Networks Compute Nodes I/O Nodes (10GbE) Operating Systems Performance Power Cooling Dimensions (includes air duct) 20 3D Torus 5.1 GB/s; 3.5 usec latency Collective Network 1.7 GB/s; 2.5 usec latency Global Barrier/Interrupt Optical 10 Gigabit Ethernet (machine control and outside connectivity) GB Control Network (system boot, debug, monitoring Quad SMP processor chip; 1024 per rack Quad SMP processor. Configurable from 8 to 64 per rack. 8 is the default configuration Compute Node Lightweight proprietary kernel Customized Linux possible for small systems. I/O Node Linux Front End and Service Nodes SUSE LINUX SLES 10 Peak per rack 13.9 teraflops 40 kw power consumption per rack (maximum) VAC 3-phase; 175 amp service per rack Air conditioning ~13 tons/rack (minimum) Height 1956 mm Width 1220 mm Depth 966 mm Weight 782 Kg Service clearance 914 mm Raised floor height 16 minimum. 48 recommended IBM System Blue Gene /P Solution Special networks speed up internode communications; designed for MPI programming constructs; improve systems management Double FPU improves performance Increases relative I/O performance Kernel tailored to processor design; industry-standard distribution preserves familiarity to end user Highest available performance benefits capability customers Low power draw enables dense packaging Low cooling requirements enable extreme scale-up Design allows brickwall layout for better floor space utilization 2007 IBM Corporation

5 Data 6.8GB/s Data 6.8GB/s 6.8GB/s Blue Gene/P node 13.6GB/s read(each), 13.6GB/s write(each) PPC 450 FPU PPC 450 FPU PPC 450 FPU PPC 450 FPU L1 L1 L1 L1 Prefetching L2 Prefetching L2 Prefetching L2 Prefetching L2 Multiplexing switch Multiplexing switch 4MB edram L3 4MB edram L3 DDR-2 Controller DDR-2 Controller 4 symmetric ports for Tree, torus and global barriers DMA module allows Remote direct put / get JTAG Control Network DMA Torus Collective Barrier 6*3.4Gb/s bidirectional 3*6.8Gb/s bidirectional Arb 10Gb Ethernet 6*3.4Gb/s To 10Gb bidirectional physical layer (Shares I/O with Torus) 13.6GB/s external DDR2 DRAM bus 2*16B 425Mb/s 17

6 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science BPC ASIC 4 cores, 8MB Cu Heatsink DRAM address termination (Vtt), EEPROM, decoupling, monitoring SDRAM-DDR2 1 of 20 sites. 16 IBM System Blue Gene /P Solution 2007 IBM Corporation

7 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Blue Gene Software Hierarchical Organization Compute nodes dedicated to running user application, and almost nothing else - simple compute node kernel (CNK) I/O nodes run Linux and provide a more complete range of OS services files, sockets, process launch, signaling, debugging, and termination Service node performs system management services (e.g., partitioning, heart beating, monitoring errors) - transparent to application software Front-end nodes, file system 10 Gb Ethernet 1 Gb Ethernet 21 IBM System Blue Gene /P Solution 2007 IBM Corporation

8 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Blue Gene/P Application Highlights Multiple Run Modes SMP: 1 MPI Process / Node, 1, 2, 3 or 4 Threads / Process Use for Mixed MPI / OpenMP programs. Use for MPI programs which need > 1 GB / process (only 1 core active per node) Dual Mode: 2 MPI Processes / Node, 1 or 2 Threads / Process Use for mixed MPI / OpenMP programs Use for MPI programs which need > 0.5 GB/process (only 2 cores active per node) Virtual Node Mode: 4 MPI Processes / Node Use for MPI only programs which need < 0.5 GB / process 24 IBM System Blue Gene /P Solution 2007 IBM Corporation

9 IBM System Blue Gene /P Solution: Expanding the Limits of Breakthrough Science Programming Models MPI only virtual node mode with enhancements Separate MPI process for each processor in the compute node DMA support for each MPI process Ensure network does not block when processor is computing Drive network harder Sharing of read-only or write-once data on each node Need programming language extension to identify read-only data Allow applications to overcome memory limits of virtual node mode MPI + OpenMP OpenMP within each node relies on cache coherence support Only master thread on each node initiates communication Get benefits of message aggregation Exploit multiple processors to service MPI call 25 IBM System Blue Gene /P Solution 2007 IBM Corporation

10 Blue Gene in Daresbury

11 Compiling on FEN macmini:~ jonathanfollows$ ssh ############################ W A R N I N G ########################### # This is a private computer facility. Access for any reason must be # # specifically authorised by the owner. Unless you are so authorised,# # your continued access and any other use may expose you to criminal # # and/or civil proceedings. # ###################################################################### Last login: Fri Nov 23 09:06: from host range btcentralplus.com jfo@bglogin2:~$ cat hello_mpi.f include 'mpif.h' integer rank, size, ierror, tag, status(mpi_status_size) character(12) message call MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) tag = 100 if(rank.eq. 0) then message = 'Hello, world' do i=1, size-1 call MPI_SEND(message, 12, MPI_CHARACTER, i, tag, MPI_COMM_WORLD, ierror) enddo else call MPI_RECV(message, 12, MPI_CHARACTER, 0, tag, MPI_COMM_WORLD, status, ierror) endif print*, 'node', rank, ':', message call MPI_FINALIZE(ierror) end jfo@bglogin2:~$ mpixlf90 hello_mpi.f -o hello ** _main === End of Compilation 1 === Compilation successful for file hello_mpi.f.

12 Running on BG/P cat run_hello arguments = -np 32 -cwd /home/jfo -exe /home/jfo/hello #@ class = BGP #@ input = /dev/null #@ output = out/$(jobid).out #@ error = out/$(jobid).err #@ wall_clock_limit=01:00:00 #@ notification = complete #@ queue jfo@bglogin2:~$ llq llq: There is currently no job status to report. jfo@bglogin2:~$ llsubmit run_hello Use of uninitialized value in string eq at /home/loadp/bluegene-filter.pl line 120. llsubmit: Processed command file through Submit Filter: "/home/loadp/bluegene-filter.pl". llsubmit: The job "bglogin2.dl.ac.uk.594" has been submitted. jfo@bglogin2:~$ llq Id Owner Submitted ST PRI Class Running On bglogin jfo 11/23 09:08 I 50 BGP 1 job step(s) in queue, 1 waiting, 0 pending, 0 running, 0 held, 0 preempted

13 cat 594.out node 6 :Hello, world node 5 :Hello, world node 13 :Hello, world node 1 :Hello, world node 29 :Hello, world node 30 :Hello, world node 20 :Hello, world node 24 :Hello, world node 8 :Hello, world node 22 :Hello, world node 26 :Hello, world node 12 :Hello, world node 11 :Hello, world node 27 :Hello, world node 10 :Hello, world node 4 :Hello, world node 9 :Hello, world node 18 :Hello, world node 16 :Hello, world node 2 :Hello, world node 0 :Hello, world node 21 :Hello, world node 25 :Hello, world node 7 :Hello, world node 14 :Hello, world node 23 :Hello, world Trivial results

14 Marginally less trivial! # # Benchmarking PingPong # ( #processes = 2 ) # ( 30 additional processes waiting in MPI_Barrier) # #bytes #repetitions t[usec] Mbytes/sec

15 Alltoall # # Benchmarking Alltoall # ( #processes = 128 ) # #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]

16 Barrier # # Benchmarking Barrier # ( #processes = 128 ) # #repetitions t_min[usec] t_max[usec] t_avg[usec] #=====================================================

17 ESSL for BG/P Just be careful because of the crosscompilation environment to pick up the correct libraries! Native libraries for Linux/POWER /usr/lib64/libessl.so /usr/lib64/libesslsmp.so Libraries for BG/P /opt/ibmmath/lib/libesslbg.a, libesslbg.so.1.3 /opt/ibmmath/lib/libesslmpbg.a, libesslsmpbg.so.1.3

18 References Machine status, request for access Other IBM information Search for Blue Gene Blue Gene/P Application Development and other titles

19 In conclusion. HPCx users should feel comfortable BG/L and BG/P are in the same machine room as HPCx, run by the same administrators, but otherwise separate systems If you ve used BG/L already, BG/P is the same - but if you re starting from scratch, BG/P is easier to port code to - and faster!

Stockholm Brain Institute Blue Gene/L

Stockholm Brain Institute Blue Gene/L Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall

More information

Porting Applications to Blue Gene/P

Porting Applications to Blue Gene/P Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P

More information

Blue Gene: A Next Generation Supercomputer (BlueGene/P)

Blue Gene: A Next Generation Supercomputer (BlueGene/P) Blue Gene: A Next Generation Supercomputer (BlueGene/P) Presented by Alan Gara (chief architect) representing the Blue Gene team. 2007 IBM Corporation Outline of Talk A brief sampling of applications on

More information

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Architecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY

Architecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY Architecture of the IBM Blue Gene Supercomputer Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY President Obama Honors IBM's Blue Gene Supercomputer With National Medal

More information

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2

Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2 Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era 11/16/2011 Many-Core Computing 2 Gene M. Amdahl, Validity of the Single-Processor Approach to Achieving

More information

Blue Gene/P Universal Performance Counters

Blue Gene/P Universal Performance Counters Blue Gene/P Universal Performance Counters Bob Walkup (walkup@us.ibm.com) 256 counters, 64 bits each; hardware unit on the BG/P chip 72 counters are in the clock-x1 domain (ppc450 core: fpu, fp load/store,

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

Lecture 20: Distributed Memory Parallelism. William Gropp

Lecture 20: Distributed Memory Parallelism. William Gropp Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order

More information

Blue Gene/Q A system overview

Blue Gene/Q A system overview Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q A system overview M. Stephan Outline Blue Gene/Q hardware design Processor Network I/O node Jülich Blue Gene/Q configurations (JUQUEEN) Blue Gene/Q software

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

Elementary Parallel Programming with Examples. Reinhold Bader (LRZ) Georg Hager (RRZE)

Elementary Parallel Programming with Examples. Reinhold Bader (LRZ) Georg Hager (RRZE) Elementary Parallel Programming with Examples Reinhold Bader (LRZ) Georg Hager (RRZE) Two Paradigms for Parallel Programming Hardware Designs Distributed Memory M Message Passing explicit programming required

More information

Blue Gene/Q User Workshop. User Environment & Job submission

Blue Gene/Q User Workshop. User Environment & Job submission Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

History. PowerPC based micro-architectures. PowerPC ISA. Introduction

History. PowerPC based micro-architectures. PowerPC ISA. Introduction PowerPC based micro-architectures Godfrey van der Linden Presentation for COMP9244 Software view of Processor Architectures 2006-05-25 History 1985 IBM started on AMERICA 1986 Development of RS/6000 1990

More information

IBM PSSC Montpellier Customer Center. Content

IBM PSSC Montpellier Customer Center. Content Content IBM PSSC Montpellier Customer Center Standard Tools Compiler Options GDB IBM System Blue Gene/P Specifics Core Files + addr2line Coreprocessor Supported Commercial Software TotalView Debugger Allinea

More information

LLVM and Clang on the Most Powerful Supercomputer in the World

LLVM and Clang on the Most Powerful Supercomputer in the World LLVM and Clang on the Most Powerful Supercomputer in the World Hal Finkel November 7, 2012 The 2012 LLVM Developers Meeting Hal Finkel (Argonne National Laboratory) LLVM and Clang on the BG/Q November

More information

Resource allocation and utilization in the Blue Gene/L supercomputer

Resource allocation and utilization in the Blue Gene/L supercomputer Resource allocation and utilization in the Blue Gene/L supercomputer Tamar Domany, Y Aridor, O Goldshmidt, Y Kliteynik, EShmueli, U Silbershtein IBM Labs in Haifa Agenda Blue Gene/L Background Blue Gene/L

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

QCD Performance on Blue Gene/L

QCD Performance on Blue Gene/L QCD Performance on Blue Gene/L Experiences with the Blue Gene/L in Jülich 18.11.06 S.Krieg NIC/ZAM 1 Blue Gene at NIC/ZAM in Jülich Overview: BGL System Compute Chip Double Hummer Network/ MPI Issues Dirac

More information

How то Use HPC Resources Efficiently by a Message Oriented Framework.

How то Use HPC Resources Efficiently by a Message Oriented Framework. How то Use HPC Resources Efficiently by a Message Oriented Framework www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivanova Institute of Information and Communication Technologies Bulgarian Academy of Science

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Beginner's Guide for UK IBM systems

Beginner's Guide for UK IBM systems Beginner's Guide for UK IBM systems This document is intended to provide some basic guidelines for those who already had certain programming knowledge with high level computer languages (e.g. Fortran,

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

ECMWF HPC Workshop 10/26/2004. IBM s High Performance Computing Strategy. Dr Don Grice Distinguished Engineer, HPC Solutions

ECMWF HPC Workshop 10/26/2004. IBM s High Performance Computing Strategy. Dr Don Grice Distinguished Engineer, HPC Solutions Research ECMWF HPC Workshop 10/26/2004 s High Performance Computing Strategy Dr Don Grice Distinguished Engineer, HPC Solutions Research HPC Key to an Innovation Economy Life Sciences: Digital Media: Increasing

More information

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability

More information

IBM PSSC Montpellier Customer Center. Information Sources

IBM PSSC Montpellier Customer Center. Information Sources Information Sources IBM Redbooks (http://www.redbooks.ibm.com/) SG24-7287-02 / IBM System Blue Gene Solution - Blue Gene/P Application Development SG24-7417-03 / IBM System Blue Gene Solution - Blue Gene-P

More information

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Debugging on Blue Waters

Debugging on Blue Waters Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Scalable Computing at Work

Scalable Computing at Work CRAY XT4 DATASHEET Scalable Computing at Work Cray XT4 Supercomputer Introducing the latest generation massively parallel processor (MPP) system from Cray the Cray XT4 supercomputer. Building on the success

More information

Compute Node Linux (CNL) The Evolution of a Compute OS

Compute Node Linux (CNL) The Evolution of a Compute OS Compute Node Linux (CNL) The Evolution of a Compute OS Overview CNL The original scheme plan, goals, requirements Status of CNL Plans Features and directions Futures May 08 Cray Inc. Proprietary Slide

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Global Headquarters: 5 Speen Street Framingham, MA USA P F Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com WHITE PAPER A New Strategic Approach To HPC: IBM's Blue Gene Sponsored by: IBM Christopher G. Willard,

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Parallel Debugging with TotalView BSC-CNS

Parallel Debugging with TotalView BSC-CNS Parallel Debugging with TotalView BSC-CNS AGENDA What debugging means? Debugging Tools in the RES Allinea DDT as alternative (RogueWave Software) What is TotalView Compiling Your Program Starting totalview

More information

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies Scalable Debugging with TotalView on Blue Gene John DelSignore, CTO TotalView Technologies Agenda TotalView on Blue Gene A little history Current status Recent TotalView improvements ReplayEngine (reverse

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Processor 2 Quad Core Intel Xeon Processor. E GHz Memory (std/max) 2048MB/48GB Slots and bays. SAS Controller, 0 installed 1.

Processor 2 Quad Core Intel Xeon Processor. E GHz Memory (std/max) 2048MB/48GB Slots and bays. SAS Controller, 0 installed 1. Product Details > Model 7979KHG At a glance... 2 Quad Core Intel Xeon Hard disk type E5335 2.00GHz Memory (std/max) 2048MB/48GB Slots and bays Number of processors std/max Level 2 cache Disk subsystem

More information

Performance Analysis on Blue Gene/P

Performance Analysis on Blue Gene/P Performance Analysis on Blue Gene/P Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook University From microprocessor to the full Blue Gene P/system IBM XL Compilers The commands

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

ALCF Argonne Leadership Computing Facility

ALCF Argonne Leadership Computing Facility ALCF Argonne Leadership Computing Facility ALCF Data Analytics and Visualization Resources William (Bill) Allcock Leadership Computing Facility Argonne Leadership Computing Facility Established 2006. Dedicated

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

Compute Node Linux: Overview, Progress to Date & Roadmap

Compute Node Linux: Overview, Progress to Date & Roadmap Compute Node Linux: Overview, Progress to Date & Roadmap David Wallace Cray Inc ABSTRACT: : This presentation will provide an overview of Compute Node Linux(CNL) for the CRAY XT machine series. Compute

More information

Task farming on Blue Gene

Task farming on Blue Gene Task farming on Blue Gene Fiona J. L. Reid July 3, 2006 Abstract In this paper we investigate how to implement a trivial task farm on the EPCC eserver Blue Gene/L system, BlueSky. This is achieved by adding

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

IBM System p5 185 Express Server

IBM System p5 185 Express Server The perfect entry system with a 3-year warranty and a price that might surprise you IBM System p5 185 Express Server responsiveness. As such, it is an excellent replacement for IBM RS/6000 150 and 170

More information

Unfolding Blue Gene. Workshop on High-Performance Computing ETH Zürich, Switzerland, September

Unfolding Blue Gene. Workshop on High-Performance Computing ETH Zürich, Switzerland, September Unfolding Blue Gene Workshop on High-Performance Computing ETH Zürich, Switzerland, September 4-5 2006 Michael Hennecke HPC Systems Architect hennecke @ de.ibm.com 2006 IBM Corporation Prologue 1 Key Criteria

More information

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation IBM HPC DIRECTIONS Dr Don Grice ECMWF Workshop November, 2008 IBM HPC Directions Agenda What Technology Trends Mean to Applications Critical Issues for getting beyond a PF Overview of the Roadrunner Project

More information

The Architecture and the Application Performance of the Earth Simulator

The Architecture and the Application Performance of the Earth Simulator The Architecture and the Application Performance of the Earth Simulator Ken ichi Itakura (JAMSTEC) http://www.jamstec.go.jp 15 Dec., 2011 ICTS-TIFR Discussion Meeting-2011 1 Location of Earth Simulator

More information

LAPI on HPS Evaluating Federation

LAPI on HPS Evaluating Federation LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

LQCD Computing at BNL

LQCD Computing at BNL LQCD Computing at BNL 2012 USQCD All-Hands Meeting Fermilab May 4, 2012 Robert Mawhinney Columbia University Some BNL Computers 8k QCDSP nodes 400 GFlops at CU 1997-2005 (Chulwoo, Pavlos, George, among

More information

High Performance Computing Systems

High Performance Computing Systems High Performance Computing Systems Multikernels Doug Shook Multikernels Two predominant approaches to OS: Full weight kernel Lightweight kernel Why not both? How does implementation affect usage and performance?

More information

Clusters of SMP s. Sean Peisert

Clusters of SMP s. Sean Peisert Clusters of SMP s Sean Peisert What s Being Discussed Today SMP s Cluters of SMP s Programming Models/Languages Relevance to Commodity Computing Relevance to Supercomputing SMP s Symmetric Multiprocessors

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE

More information

Introducing the next generation of affordable and productive massively parallel processing (MPP) computing the Cray XE6m supercomputer.

Introducing the next generation of affordable and productive massively parallel processing (MPP) computing the Cray XE6m supercomputer. Introducing the next generation of affordable and productive massively parallel processing (MPP) computing the Cray XE6m supercomputer. Building on the reliability and scalability of the Cray XE6 supercomputer

More information

EEC 581 Computer Architecture. Lec 10 Multiprocessors (4.3 & 4.4)

EEC 581 Computer Architecture. Lec 10 Multiprocessors (4.3 & 4.4) EEC 581 Computer Architecture Lec 10 Multiprocessors (4.3 & 4.4) Chansu Yu Electrical and Computer Engineering Cleveland State University Acknowledgement Part of class notes are from David Patterson Electrical

More information

Implementing Optimized Collective Communication Routines on the IBM BlueGene/L Supercomputer

Implementing Optimized Collective Communication Routines on the IBM BlueGene/L Supercomputer 1. Introduction Implementing Optimized Collective Communication Routines on the IBM BlueGene/L Supercomputer Sam Miller samm@scl.ameslab.gov Computer Science 425 Prof. Ricky Kendall Iowa State University

More information

BG/P Software Overview IBM Corporation

BG/P Software Overview IBM Corporation BG/P Software Overview Overview BlueGene/P Quick Introduction BG/P Hardware Software MPI Implementation on BlueGene/P General Comments and Optimization Suggestions I/O On BGP (a little) Compiler Tricks

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

Power Technology For a Smarter Future

Power Technology For a Smarter Future 2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation

More information

Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia

Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia Marco Briscolini Deep Computing Sales Marco_briscolini@it.ibm.com IBM Pathways to Deep Computing Single Integrated

More information

Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen

Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen 1. Motivation, History, Cluster Schema 2. PC cluster fast interconnect 3. PC Cluster Hardware 4. PC Cluster Software 5. Operating 6. Future

More information

Altos R320 F3 Specifications. Product overview. Product views. Internal view

Altos R320 F3 Specifications. Product overview. Product views. Internal view Product overview The Altos R320 F3 single-socket 1U rack server delivers great performance and enterprise-level scalability in a space-saving design. Proactive management utilities effectively handle SMB

More information

Parallel Computing: From Inexpensive Servers to Supercomputers

Parallel Computing: From Inexpensive Servers to Supercomputers Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl

More information

Our Workshop Environment

Our Workshop Environment Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2015 Our Environment Today Your laptops or workstations: only used for portal access Blue Waters

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

Roadmapping of HPC interconnects

Roadmapping of HPC interconnects Roadmapping of HPC interconnects MIT Microphotonics Center, Fall Meeting Nov. 21, 2008 Alan Benner, bennera@us.ibm.com Outline Top500 Systems, Nov. 2008 - Review of most recent list & implications on interconnect

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous

More information

IBM s Data Warehouse Appliance Offerings

IBM s Data Warehouse Appliance Offerings IBM s Data Warehouse Appliance Offerings RChaitanya IBM India Software Labs Agenda 1 IBM Smart Analytics System (D5600) System Overview Technical Architecture Software / Hardware stack details 2 Netezza

More information

Altos T310 F3 Specifications

Altos T310 F3 Specifications Product overview The Altos T310 F3 delivers proactive management tools matched by best priceperformance technology ideal for SMB and branch office operations. This singlesocket tower server features an

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

Brand-New Vector Supercomputer

Brand-New Vector Supercomputer Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth

More information

Optimization of MPI Applications Rolf Rabenseifner

Optimization of MPI Applications Rolf Rabenseifner Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization

More information

eslim SV Dual and Quad-Core Xeon Server Dual and Quad-Core Server Computing Leader!! ESLIM KOREA INC.

eslim SV Dual and Quad-Core Xeon Server  Dual and Quad-Core Server Computing Leader!! ESLIM KOREA INC. eslim SV7-2186 Dual and Quad-Core Xeon Server www.eslim.co.kr Dual and Quad-Core Server Computing Leader!! ESLIM KOREA INC. 1. Overview eslim SV7-2186 Server Dual and Quad-Core Intel Xeon Processors 4

More information

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

MVAPICH MPI and Open MPI

MVAPICH MPI and Open MPI CHAPTER 6 The following sections appear in this chapter: Introduction, page 6-1 Initial Setup, page 6-2 Configure SSH, page 6-2 Edit Environment Variables, page 6-5 Perform MPI Bandwidth Test, page 6-8

More information

Next Generation Multi-Purpose Microprocessor

Next Generation Multi-Purpose Microprocessor Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009 www.aeroflex.com/gaisler OUTLINE NGMP key requirements Development schedule Architectural Overview LEON4FT features

More information

Mission-Critical Enterprise Linux. April 17, 2006

Mission-Critical Enterprise Linux. April 17, 2006 Mission-Critical Enterprise Linux April 17, 2006 Agenda Welcome Who we are & what we do Steve Meyers, Director Unisys Linux Systems Group (steven.meyers@unisys.com) Technical Presentations Xen Virtualization

More information

Paving the Road to Exascale

Paving the Road to Exascale Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015

More information