Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR /2005. Gary Grider CCN-9

Size: px
Start display at page:

Download "Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR /2005. Gary Grider CCN-9"

Transcription

1 Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR /2005 Gary Grider CCN-9

2 Background

3 Disk2500 TeraBytes Parallel I/O What drives us? Provide reliable, easy-to-use, high-performance, scalable, and secure, I/O VIA standard interfaces MPI-IO and POSIX Memory GigaBytes/sec Balanced System Approach Computing Speed TeraBytes Network Speed Gigabits/sec FLOP/s Archival Storage Gigabytes/sec Year 2003 App Perfor mance Metadata Inserts/sec Disk Rich Supercomputer Past Generation ASCI Platform Disk Poor Clients cluster cluster cluster cluster Job Queue Site Backbone Archive Scalable OBFS Enterprise Global FS File System Gateways BackBone Disk Rich Supercomputer Object Archive Current Generation ASCI Platform Diskless Clients cluster cluster cluster cluster

4 How did we get here? In through the 5 year ASCI planning process we realized that we didn t have a good scalable file system and I/O story few solutions existed, none were heterogeneous, none were open source, none were based on standards, and none were secure on a public net. We also saw Linux clusters coming in the future and there were NO solutions on the horizon for Linux How could we proceed given the enormous cost of developing file and storage system solutions? Through Leverage, Partnerships, Careful Planning, and did I mention leverage? Vendors Standards bodies Universities HPC Community We needed atri-lab File System Requirements document) ftp://ftp.lanl.gov/public/ggrider/ascifsrfp.doc

5 Historical Time Line for ASC I/O R&D proposed Path Forward activity for SGPFS propose initial architecture build initial requirements document SGPFS workshop You are Crazy Tri-Lab joint requirements document complete Path Forward proposal with OBSD vendor, Panasas born Path Forward project to pursue RFI/RFQ, analysis, recommend funding open source OBSD development and NFSv4 projects Alliance contracts placed with universities on OBSD, small/unaligned/unba lanced/overlapped I/O, and NFSv4 Path Forward Lustre project born U of Minn Object Archive begins pnfs effort begins Lets reinvent Posix I/O?

6 Leveraging vendor collaborations Solution for Linux clusters and enterprise class heterogeneous global parallel file systems HP/CFS/Intel Lustre Path Forward for object based secure global parallel file system Very scalable bandwidth, Started metadata scaling work Being used at LLNL and initial deployment at Sandia Starting metadata scaling work Panasas Very scalable bandwidth, Started metadata scaling work Being used at LANL, Sandia, and LLNL deployed in our Red, Yellow, and Turquoise networks On > 1000 node clusters Even shared between clusters

7 Leveraging University partnerships Michigan to fill in the gaps! Assisting in design and testing of NFSv4 for multi-domain enabled secure NFS, NFS parallel extensions, and NFS in a non IP environment, NFS as a client to SAN and OBFS file systems, NFS server load balancing in a cluster file system setting Some results are showing up in Linux 2.5/2.6 kernel, pnfs IETF work starting Northwestern Move coordination up to MPI-IO, out of the file system for overlapped, unaligned, and small I/O UCSC Study on security and clustered metadata scaling, to guide our two object file system future design and development activities Minnesota (Intelligent Storage Consortia) and STK to develop first infinite (HSM) object based device to enable parallel object archive Leverage existing OBFS s and commercial HSM software (multiple copies in parallel)

8 Leveraging collaborations with other labs Our partners LLNL and Sandia Basically everything we do is coordinated with them ANL MPICH MPI-2 reference MPI-IO advanced PVFS2 explorations NetCDF GridFTP NCSA HDF

9 LANL Deployments Finally we have truly capitalized on our 5+ year long file system investments Turquoise Institutional and Alliance Computing Yellow Unclassified programmatic computing Red Classified programmatic computing

10 Turquoise Configuration Pink 1916 Xeon procs Future Institutional machine fy05 Myrinet 64 IO nodes Gig-E Panasas 16 Storage Shelves 80 TB ~6.4 GB/s 16 IO nodes Myrinet Mauve Altix Cluster 256 procs 112 Compute Nodes TLC 224 AMD64 procs

11 Flash 728 AMD64 procs Yellow Configuration Myrinet 64 IO nodes Gig-E Panasas 8 Storage Shelves 40 TB ~3.2 GB/s Future 1024 proc AMD64 FY05 Future Viz cluster FY06 Future Direct HPSS movement agents FY05

12 Lightning 3072 AMD64 procs Red Configuration 128 IO nodes Myrinet Gig-E Panasas 48 Storage Shelves 200 TB ~20 GB/s Future Viewmaster Viz cluster fy05/06 Future Capacity machine fy05 and fy06/7 Future direct HPSS movement agents (FY05 ASAP)

13 Pink Data Throughput

14 Pink Metadata Throughput

15 The Middleware Story Now that we have all this scalable parallel file system plumbing, lets install some fixtures (help the Apps take advantage)!

16 The I/O stack UDM Meshes, Coordinate Systems, etc. Application HDF5 formatting Layer MPI-IO Scalable Global Parallel File System with POSIX interface

17 Example of well aligned I/O Process 1 Process 2 Process 3 Process Parallel file RAID Group 1 RAID Group 2 RAID Group 3 RAID Group 4 Writes can be aggregaged/cached on clients to make enough contiguous data to do full stride parity writes eliminating read/update/write and some processes write to RAID Group1 at the same time as others write to the other RAID Groups in PARALLEL, additionally well balanced I/O means all procs write same amount

18 As applications started running, not just porting and testing, poor alignment and extremely small write problems were identified Process 1 Process 2 Process 3 Process Parallel file RAID Group 1 RAID Group 2 RAID Group 3 Very small, unbalanced, and unaligned writes, much smaller than a full RAID stripe, every process queues up to read/update/write a partial block, all serialized on RAID Group 1, then everyone moves and does it again, eventually they all move to serialize on RAID Group 2, etc.

19 We are working with the applications to utilize smart PSE/Path Forward investments in MPI/IO and File Systems Tune-ability Process 1 Process 2 Process 3 Process CB procs Parallel file RAID Group 1 RAID Group 2 RAID Group 3 RAID Group 4 Leveraging Argonne and Northwestern work, collective two-phased buffered I/O helps align, balance, and aggregate. Adjustable file geometries in the Panasas file system allows for helping the file system accommodate the file geometry with changing only hints. Caching layer is the next step!

20 In deploying a multi-cluster shared global parallel file system we needed a scalable and reliable I/O network PaScalBB (Parallel Scalable Back Bone) Disk Poor Clients cluster Job Queue File System Gateways Diskless Clients cluster cluster cluster cluster BackBone Scalable OBFS Enterprise Global FS Object Archive cluster cluster cluster

21 Panasas Storage Shelves/Blades Default route Phase 1 Subnet Panasas private subnet for broadcast functions Extreme 6808 Gig-E switch VirtRT OSPF VirtRT OSPF/ECMP subnet VirtRT OSPF/ECMP subnet VirtRT OSPF/ECMP subnet subnet VirtRT OSPF/ECMP To Panasas loadbal / OSPF From Panasas OSPF ECMP / Other Serices NFS Front end To Site Network Subnet mask Myrinet Compute Nodes: default route color.1 nexthop color.2 equal-weight gctimeout failover

22 Phase 1 Reverse Panasas Storage Shelves/Blades Default route Subnet Panasas private subnet for broadcast functions Extreme 6808 Gig-E switch VirtRT OSPF VirtRT OSPF/ECMP subnet VirtRT OSPF/ECMP subnet VirtRT OSPF/ECMP subnet subnet VirtRT OSPF/ECMP To Panasas loadbal / OSPF From Panasas OSPF ECMP / Other Serices NFS Front end To Site Network Subnet mask Myrinet Compute Nodes: default route color.1 nexthop color.2 equal-weight gctimeout failover

23 Phase 3 Single file system LANE 1 (subnet ) LANE 2 (subnet ) LANE N (subnet N) To Panasas loadbal From Panasas OSPF/loadbal Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Other Serices NFS Front end To Site Network Subnet mask Myrinet Compute Nodes: for each color route color.1 nexthop color.2 equal-weight gctimeout failover

24 Phase 3 Reverse Single file system LANE 1 (subnet ) LANE 2 (subnet ) LANE N (subnet N) To Panasas loadbal From Panasas OSPF/loadbal Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Layer 2 switch Other Serices NFS Front end To Site Network Subnet mask Myrinet Compute Nodes: for each color route color.1 nexthop color.2 equal-weight gctimeout failover

25 KRB DNS1 NFS To Site Network Archive Phase 4 -Mult-Lane Environment totally scalable with fail over Cluster A Global Parallel File System I/O Nodes Compute Nodes Layer 2 switches Cluster B Cluster C

26 You have already heard about our Vendor Relationships and ongoing University Partnerships So what is new!

27 Today FS client to storage Protocol is TCP/IP Sockets Compute Nodes Physical View Storage Myrinet IO nodes Gig-E User Space Application write(fd,buffer,len) on compute node User Buffer Cached in Kernel I/O Pages Logical View Sock thread 1 Sock thread 2 Sock thread 3 Sock thread 4 Cache Pressure or time causes movement to storage TCP/IP socket to device 2 TCP/IP socket to device 11 TCP/IP socket to device 18 TCP/IP socket to device 25

28 User Space Application write(fd,buffer,len) on compute node User Buffer ANSI T10 OSD is an extension to IETF iscsi, iscsi is encapsulated in TCP/IP Lets dig a bit deeper and look at protocols Logical View Cached in Kernel I/O Pages Sock thread 1 Sock thread 2 Sock thread 3 Sock thread 4 Cache Pressure or time causes movement to storage Protocol View TCP/IP socket to device 2 TCP/IP socket to device 11 TCP/IP socket to device 18 TCP/IP socket to device 25 This part can be quite big IETF IP info IETF TCP info ANSI T10 OSD iscsi extension security cability Object ID variable length OSD IETF iscsi CMD write (offset/length) iscsi SCSI DATA Payload Footers TCP IP

29 Server Side Pull Scheduled Transfer User Space Application write(fd,buffer,len) User Buffer Cached in Kernel I/O Pages Small transfers Cache Pressure or time causes movement to storage IETF IP info IETF TCP info Sock thread 1 Sock thread 2 Sock thread 3 Sock thread 4 ANSI T10 OSD iscsi extension security cability Object ID variable length CMD TCP/IP socket to device 2 CMD TCP/IP socket to device 11 CMD TCP/IP socket to device 18 CMD TCP/IP socket to device 25 For large transfers the server (storage) remotely maps the clients memory via RDMA and moves the data for the client, completely offloaded and scheduled by the server (storage) IETF iscsi CMD write (offset/length) SCSI DATA Payload Footers Large transfer IETF IP info IETF TCP info ANSI T10 OSD iscsi extension security cability Object ID variable length IETF iscsi CMD write (offset/length) Client remote mem map Footers

30 The Future Storage Protocol Story IETF NFSv4 for secure and performing metadata IETF pnfs for data path securely and directly to OSD devices IETF pnfs data path is iser (RDMA) carrying ANSI OSD enhanced iscsi Running on medias like Ethernet(s) Infiniband

31 POSIX IO Additions Most of the current issues with POSIX IO semantics lie in the lack of support for distributed/parallel processes Concepts that involved implied ordering need to have alternative verbs that do not imply ordering Vectored I/O read/write calls that don t imply ordering Extending the end of file issues Group opens Etc. Concepts that involve serialization for strict POSIX metadata query/update need to have lazy POSIX alternatives Last update date (mtime,atime,ctime), Size of file Status: LANL/tri-labs joined The Open Group which holds the current One Unix charter which merges IEEE, ANSI, and POSIX standards and evolves the POSIX standard for UNIX Next Step is to write up proposed new changes and begin discussion process within the POSIX UNIX IO API working group in the Open Group forum. This process has begun with reps from DOE office of science, DOD, DOE NNSA, IBM, Panasas, CMU, U of Minnesota, UCSC, etc.

32 Multi-Agency I/O R&D Steering Authored joint R&D needs document for next five years of needed I/O and file systems R&D work (DOE NNSA/Office of Science, DOD NSA) Working with DARPA on how to spin up more I/O R&D for HPC (as part of high productivity etc.) Working with NSF on how to spin up more I/O R&D for HPC ( under HECURA effort or in more general )

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR 042787 05/2004 Parallel File Systems and Parallel I/O Why - From

More information

HPC I/O and File Systems Issues and Perspectives

HPC I/O and File Systems Issues and Perspectives HPC I/O and File Systems Issues and Perspectives Gary Grider, LANL LA-UR-06-0473 01/2006 Why do we need so much HPC Urgency and importance of Stockpile Stewardship mission and schedule demand multiphysics

More information

Coordinating Parallel HSM in Object-based Cluster Filesystems

Coordinating Parallel HSM in Object-based Cluster Filesystems Coordinating Parallel HSM in Object-based Cluster Filesystems Dingshan He, Xianbo Zhang, David Du University of Minnesota Gary Grider Los Alamos National Lab Agenda Motivations Parallel archiving/retrieving

More information

Table 9. ASCI Data Storage Requirements

Table 9. ASCI Data Storage Requirements Table 9. ASCI Data Storage Requirements 1998 1999 2000 2001 2002 2003 2004 ASCI memory (TB) Storage Growth / Year (PB) Total Storage Capacity (PB) Single File Xfr Rate (GB/sec).44 4 1.5 4.5 8.9 15. 8 28

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

InfiniBand Networked Flash Storage

InfiniBand Networked Flash Storage InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB

More information

Structuring PLFS for Extensibility

Structuring PLFS for Extensibility Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w

More information

HPC NETWORKING IN THE REAL WORLD

HPC NETWORKING IN THE REAL WORLD 15 th ANNUAL WORKSHOP 2019 HPC NETWORKING IN THE REAL WORLD Jesse Martinez Los Alamos National Laboratory March 19 th, 2019 [ LOGO HERE ] LA-UR-19-22146 ABSTRACT Introduction to LANL High Speed Networking

More information

Parallel NFS (pnfs) Garth Gibson CTO & co-founder, Panasas Assoc. Professor, Carnegie Mellon University.

Parallel NFS (pnfs) Garth Gibson CTO & co-founder, Panasas Assoc. Professor, Carnegie Mellon University. Parallel NFS (pnfs) Garth Gibson CTO & co-founder, Panasas Assoc. Professor, Carnegie Mellon University www.panasas.com www.panasas.com Panasas Parallel Storage: A Little Background Panasas Storage Cluster

More information

Lustre A Platform for Intelligent Scale-Out Storage

Lustre A Platform for Intelligent Scale-Out Storage Lustre A Platform for Intelligent Scale-Out Storage Rumi Zahir, rumi. May 2003 rumi.zahir@intel.com Agenda Problem Statement Trends & Current Data Center Storage Architectures The Lustre File System Project

More information

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National

More information

The Last Bottleneck: How Parallel I/O can improve application performance

The Last Bottleneck: How Parallel I/O can improve application performance The Last Bottleneck: How Parallel I/O can improve application performance HPC ADVISORY COUNCIL STANFORD WORKSHOP; DECEMBER 6 TH 2011 REX TANAKIT DIRECTOR OF INDUSTRY SOLUTIONS AGENDA Panasas Overview Who

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity

More information

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions Roger Goff Senior Product Manager DataDirect Networks, Inc. What is Lustre? Parallel/shared file system for

More information

pnfs and Linux: Working Towards a Heterogeneous Future

pnfs and Linux: Working Towards a Heterogeneous Future CITI Technical Report 06-06 pnfs and Linux: Working Towards a Heterogeneous Future Dean Hildebrand dhildebz@umich.edu Peter Honeyman honey@umich.edu ABSTRACT Anticipating terascale and petascale HPC demands,

More information

ECE7995 (7) Parallel I/O

ECE7995 (7) Parallel I/O ECE7995 (7) Parallel I/O 1 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped

More information

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics Dean Hildebrand Research Staff Member PDSW 2009 pnfs, POSIX, and MPI-IO: A Tale of Three Semantics Dean Hildebrand, Roger Haskin Arifa Nisar IBM Almaden Northwestern University Agenda Motivation pnfs HPC

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Architecting Storage for Semiconductor Design: Manufacturing Preparation White Paper Architecting Storage for Semiconductor Design: Manufacturing Preparation March 2012 WP-7157 EXECUTIVE SUMMARY The manufacturing preparation phase of semiconductor design especially mask data

More information

Parallel File Systems for HPC

Parallel File Systems for HPC Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical

More information

NNSA Advanced Simulation and Computing An Overview of Data Management Issues

NNSA Advanced Simulation and Computing An Overview of Data Management Issues NNSA Advanced Simulation and Computing An Overview of Data Management Issues Steve Louis Lawrence Livermore National Lab LLNL ASC VIEWS Program Lead TEL: FAX: E-mail: 1-925-422-1550 1-925-423-8715 stlouis@llnl.gov

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

System input-output, performance aspects March 2009 Guy Chesnot

System input-output, performance aspects March 2009 Guy Chesnot Headline in Arial Bold 30pt System input-output, performance aspects March 2009 Guy Chesnot Agenda Data sharing Evolution & current tendencies Performances: obstacles Performances: some results and good

More information

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and

More information

Storage and Storage Access

Storage and Storage Access Rainer Többicke CERN/IT 1 Introduction Data access Raw data, analysis data, software repositories, calibration data Small files, large files Frequent access Sequential access, random access Large variety

More information

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing HPC File Systems and Storage Irena Johnson University of Notre Dame Center for Research Computing HPC (High Performance Computing) Aggregating computer power for higher performance than that of a typical

More information

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Overview HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Market Strategy HPC Commercial Scientific Modeling & Simulation Big Data Hadoop In-memory Analytics Archive Cloud Public Private

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Infiniband and RDMA Technology. Doug Ledford

Infiniband and RDMA Technology. Doug Ledford Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic

More information

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law ERESEARCH AUSTRALASIA, NOVEMBER 2011 REX TANAKIT DIRECTOR OF INDUSTRY SOLUTIONS AGENDA Parallel System Parallel processing goes mainstream

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

Application Acceleration Beyond Flash Storage

Application Acceleration Beyond Flash Storage Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage

More information

Creating an agile infrastructure with Virtualized I/O

Creating an agile infrastructure with Virtualized I/O etrading & Market Data Agile infrastructure Telecoms Data Center Grid Creating an agile infrastructure with Virtualized I/O Richard Croucher May 2009 Smart Infrastructure Solutions London New York Singapore

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

by Brian Hausauer, Chief Architect, NetEffect, Inc

by Brian Hausauer, Chief Architect, NetEffect, Inc iwarp Ethernet: Eliminating Overhead In Data Center Designs Latest extensions to Ethernet virtually eliminate the overhead associated with transport processing, intermediate buffer copies, and application

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan GridNFS: Scaling to Petabyte Grid File Systems Andy Adamson Center For Information Technology Integration University of Michigan What is GridNFS? GridNFS is a collection of NFS version 4 features and minor

More information

Object-based Storage (OSD) Architecture and Systems

Object-based Storage (OSD) Architecture and Systems Object-based Storage (OSD) Architecture and Systems Erik Riedel, Seagate Technology April 2007 Abstract Object-based Storage (OSD) Architecture and Systems The Object-based Storage Device interface standard

More information

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November 2008 Abstract This paper provides information about Lustre networking that can be used

More information

Next Generation CEA Computing Centres

Next Generation CEA Computing Centres Next Generation IO @ CEA Computing Centres J-Ch Lafoucriere ORAP Forum #39 2017-03-28 A long History of Storage Architectures Last Century Compute Systems Few Cray Supercomputers (vectors and MPP) Few

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

Coming Changes in Storage Technology. Be Ready or Be Left Behind

Coming Changes in Storage Technology. Be Ready or Be Left Behind Coming Changes in Storage Technology Be Ready or Be Left Behind Henry Newman, CTO Instrumental Inc. hsn@instrumental.com Copyright 2008 Instrumental, Inc. 1of 32 The Future Will Be Different The storage

More information

Storage Protocol Offload for Virtualized Environments Session 301-F

Storage Protocol Offload for Virtualized Environments Session 301-F Storage Protocol Offload for Virtualized Environments Session 301-F Dennis Martin, President August 2016 1 Agenda About Demartek Offloads I/O Virtualization Concepts RDMA Concepts Overlay Networks and

More information

EIOW Exa-scale I/O workgroup (exascale10)

EIOW Exa-scale I/O workgroup (exascale10) EIOW Exa-scale I/O workgroup (exascale10) Meghan McClelland Peter Braam Lug 2013 Large scale data management is fundamentally broken but functions somewhat successfully as an awkward patchwork Current

More information

pnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011

pnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011 pnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011 Brent Welch welch@panasas.com Panasas, Inc. 1 Why a Standard for Parallel I/O? NFS is the only network file system

More information

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible

More information

HPSS Treefrog Summary MARCH 1, 2018

HPSS Treefrog Summary MARCH 1, 2018 HPSS Treefrog Summary MARCH 1, 2018 Disclaimer Forward looking information including schedules and future software reflect current planning that may change and should not be taken as commitments by IBM

More information

February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community

February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community NFSv4.1/pNFS Ready for Prime Time Deployment February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community Value of NFSv4.1 / pnfs Industry Standard Secure Performance and Scale Throughput Increased

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp STORAGE CONSOLIDATION WITH IP STORAGE David Dale, NetApp SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in

More information

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to

More information

HPC Customer Requirements for OpenFabrics Software

HPC Customer Requirements for OpenFabrics Software HPC Customer Requirements for OpenFabrics Software Matt Leininger, Ph.D. Sandia National Laboratories Scalable Computing R&D Livermore, CA 16 November 2006 I'll focus on software requirements (well maybe)

More information

Filesystems on SSCK's HP XC6000

Filesystems on SSCK's HP XC6000 Filesystems on SSCK's HP XC6000 Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Overview» Overview of HP SFS at SSCK HP StorageWorks Scalable File Share (SFS) based on

More information

Customer Success Story Los Alamos National Laboratory

Customer Success Story Los Alamos National Laboratory Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Case Study June 2010 Highlights First Petaflop

More information

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance TechTarget Dennis Martin 1 Agenda About Demartek I/O Virtualization Concepts RDMA Concepts Examples Demartek

More information

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp STORAGE CONSOLIDATION WITH IP STORAGE David Dale, NetApp SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in

More information

LANL Data Release and I/O Forwarding Scalable Layer (IOFSL): Background and Plans

LANL Data Release and I/O Forwarding Scalable Layer (IOFSL): Background and Plans LANL Data Release and I/O Forwarding Scalable Layer (IOFSL): Background and Plans Intelligent Storage Workshop May 14, 2008 Gary Grider HPC-DO James Nunez HPC-5 John Bent HPC-5 LA-UR-08-0252/LA-UR-07-7449

More information

OFED Storage Protocols

OFED Storage Protocols OFED Storage Protocols R. Pearson System Fabric Works, Inc. Agenda Why OFED Storage Introduction to OFED Storage Protocols OFED Storage Protocol Update 2 Why OFED Storage 3 Goals of I/O Consolidation Cluster

More information

Object Storage: Redefining Bandwidth for Linux Clusters

Object Storage: Redefining Bandwidth for Linux Clusters Object Storage: Redefining Bandwidth for Linux Clusters Brent Welch Principal Architect, Inc. November 18, 2003 Blocks, Files and Objects Block-base architecture: fast but private Traditional SCSI and

More information

DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience

DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Vijay Velusamy, Anthony Skjellum MPI Software Technology, Inc. Email: {vijay, tony}@mpi-softtech.com Arkady Kanevsky *,

More information

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC To Infiniband or Not Infiniband, One Site s s Perspective Steve Woods MCNC 1 Agenda Infiniband background Current configuration Base Performance Application performance experience Future Conclusions 2

More information

libhio: Optimizing IO on Cray XC Systems With DataWarp

libhio: Optimizing IO on Cray XC Systems With DataWarp libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality

More information

Brent Callaghan Sun Microsystems, Inc. Sun Microsystems, Inc

Brent Callaghan Sun Microsystems, Inc. Sun Microsystems, Inc Brent Callaghan. brent@eng.sun.com Page 1 of 19 A Problem: Data Center Performance CPU 1 Gb Fibre Channel 100 MB/sec Storage Array CPU NFS 1 Gb Ethernet 50 MB/sec (via Gigaswift) NFS Server Page 2 of 19

More information

Delivering HPC Performance at Scale

Delivering HPC Performance at Scale Delivering HPC Performance at Scale October 2011 Joseph Yaworski QLogic Director HPC Product Marketing Office: 610-233-4854 Joseph.Yaworski@QLogic.com Agenda QLogic Overview TrueScale Performance Design

More information

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

pnfs BOF SC

pnfs BOF SC pnfs BOF SC08 2008-11-19 Spencer Shepler, StorSpeed Bruce Fields, CITI (University of Michigan) Sorin Faibish, EMC Roger Haskin, IBM Ken Gibson, LSI Joshua Konkle, NetApp Brent Welch, Panasas Bill Baker,

More information

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users Phil Andrews, Bryan Banister, Patricia Kovatch, Chris Jordan San Diego Supercomputer Center University

More information

Parallel I/O Libraries and Techniques

Parallel I/O Libraries and Techniques Parallel I/O Libraries and Techniques Mark Howison User Services & Support I/O for scientifc data I/O is commonly used by scientific applications to: Store numerical output from simulations Load initial

More information

Lustre overview and roadmap to Exascale computing

Lustre overview and roadmap to Exascale computing HPC Advisory Council China Workshop Jinan China, October 26th 2011 Lustre overview and roadmap to Exascale computing Liang Zhen Whamcloud, Inc liang@whamcloud.com Agenda Lustre technology overview Lustre

More information

The Case for RDMA. Jim Pinkerton RDMA Consortium 5/29/2002

The Case for RDMA. Jim Pinkerton RDMA Consortium 5/29/2002 The Case for RDMA Jim Pinkerton RDMA Consortium 5/29/2002 Agenda What is the problem? CPU utilization and memory BW bottlenecks Offload technology has failed (many times) RDMA is a proven sol n to the

More information

TOSS - A RHEL-based Operating System for HPC Clusters

TOSS - A RHEL-based Operating System for HPC Clusters TOSS - A RHEL-based Operating System for HPC Clusters Supercomputing 2017 Red Hat Booth November 14, 2017 Ned Bass System Software Development Group Leader Livermore Computing Division LLNL-PRES-741473

More information

Extreme I/O Scaling with HDF5

Extreme I/O Scaling with HDF5 Extreme I/O Scaling with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group koziol@hdfgroup.org July 15, 2012 XSEDE 12 - Extreme Scaling Workshop 1 Outline Brief overview of

More information

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp www.cs.illinois.edu/~wgropp Messages Current I/O performance is often appallingly poor Even relative to what current systems can achieve

More information

Reflections on Failure in Post-Terascale Parallel Computing

Reflections on Failure in Post-Terascale Parallel Computing Reflections on Failure in Post-Terascale Parallel Computing 2007 Int. Conf. on Parallel Processing, Xi An China Garth Gibson Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage

More information

A GPFS Primer October 2005

A GPFS Primer October 2005 A Primer October 2005 Overview This paper describes (General Parallel File System) Version 2, Release 3 for AIX 5L and Linux. It provides an overview of key concepts which should be understood by those

More information

Data Movement & Storage Using the Data Capacitor Filesystem

Data Movement & Storage Using the Data Capacitor Filesystem Data Movement & Storage Using the Data Capacitor Filesystem Justin Miller jupmille@indiana.edu http://pti.iu.edu/dc Big Data for Science Workshop July 2010 Challenges for DISC Keynote by Alex Szalay identified

More information

The advantages of architecting an open iscsi SAN

The advantages of architecting an open iscsi SAN Storage as it should be The advantages of architecting an open iscsi SAN Pete Caviness Lefthand Networks, 5500 Flatiron Parkway, Boulder CO 80301, Ph: +1-303-217-9043, FAX: +1-303-217-9020 e-mail: pete.caviness@lefthandnetworks.com

More information

CERN openlab Summer 2006: Networking Overview

CERN openlab Summer 2006: Networking Overview CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,

More information

BeeGFS Solid, fast and made in Europe

BeeGFS Solid, fast and made in Europe David Ramírez Alvarez HPC INTEGRATOR MANAGER WWW.SIE.ES dramirez@sie.es ADMINTECH 2016 BeeGFS Solid, fast and made in Europe www.beegfs.com Thanks to Sven for info!!!, CEO, ThinkParQ What is BeeGFS? BeeGFS

More information

Introducing SUSE Enterprise Storage 5

Introducing SUSE Enterprise Storage 5 Introducing SUSE Enterprise Storage 5 1 SUSE Enterprise Storage 5 SUSE Enterprise Storage 5 is the ideal solution for Compliance, Archive, Backup and Large Data. Customers can simplify and scale the storage

More information

High-Performance Lustre with Maximum Data Assurance

High-Performance Lustre with Maximum Data Assurance High-Performance Lustre with Maximum Data Assurance Silicon Graphics International Corp. 900 North McCarthy Blvd. Milpitas, CA 95035 Disclaimer and Copyright Notice The information presented here is meant

More information

An introduction to GPFS Version 3.3

An introduction to GPFS Version 3.3 IBM white paper An introduction to GPFS Version 3.3 Scott Fadden, IBM Corporation Contents 1 Overview 2 What is GPFS? 2 The file system 2 Application interfaces 3 Performance and scalability 3 Administration

More information

The GUPFS Project at NERSC Greg Butler, Rei Lee, Michael Welcome. NERSC Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley CA USA

The GUPFS Project at NERSC Greg Butler, Rei Lee, Michael Welcome. NERSC Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley CA USA The GUPFS Project at NERSC Greg Butler, Rei Lee, Michael Welcome NERSC Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley CA USA 1 The GUPFS Project at NERSC This work was supported by the

More information

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support Data Management Dr David Henty HPC Training and Support d.henty@epcc.ed.ac.uk +44 131 650 5960 Overview Lecture will cover Why is IO difficult Why is parallel IO even worse Lustre GPFS Performance on ARCHER

More information

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise SUSE High Performance Computing Eduardo Diaz PreSales SUSE Linux Enterprise ediaz@suse.com Alberto Esteban Territory Manager North-East SUSE Linux Enterprise aesteban@suse.com HPC Overview SUSE High Performance

More information

Accelerate Applications Using EqualLogic Arrays with directcache

Accelerate Applications Using EqualLogic Arrays with directcache Accelerate Applications Using EqualLogic Arrays with directcache Abstract This paper demonstrates how combining Fusion iomemory products with directcache software in host servers significantly improves

More information

CHALLENGES FOR PARALLEL I/O IN GRID COMPUTING

CHALLENGES FOR PARALLEL I/O IN GRID COMPUTING CHALLENGES FOR PARALLEL I/O IN GRID COMPUTING AVERY CHING 1, KENIN COLOMA 2, AND ALOK CHOUDHARY 3 1 Northwestern University, aching@ece.northwestern.edu, 2 Northwestern University, kcoloma@ece.northwestern.edu,

More information

Multifunction Networking Adapters

Multifunction Networking Adapters Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained

More information

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0 INFINIBAND OVERVIEW -, 2010 Page 1 Version 1.0 Why InfiniBand? Open and comprehensive standard with broad vendor support Standard defined by the InfiniBand Trade Association (Sun was a founder member,

More information

Exa Scale FSIO Can we get there? Can we afford to?

Exa Scale FSIO Can we get there? Can we afford to? Exa Scale FSIO Can we get there? Can we afford to? 05/2011 Gary Grider, LANL LA UR 10 04611 Pop Quiz: How Old is this Guy? Back to Exa FSIO Mission Drivers Power is a Driving Issue Power per flop Power

More information

File Storage Level 100

File Storage Level 100 File Storage Level 100 Rohit Rahi November 2018 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be

More information

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium Storage on the Lunatic Fringe Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium tmruwart@dtc.umn.edu Orientation Who are the lunatics? What are their requirements?

More information

Technical Computing Suite supporting the hybrid system

Technical Computing Suite supporting the hybrid system Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect

More information

pnfs and GPFS Dean Hildebrand Storage Systems IBM Almaden Research Lab 2012 IBM Corporation

pnfs and GPFS Dean Hildebrand Storage Systems IBM Almaden Research Lab 2012 IBM Corporation pnfs and GPFS Dean Hildebrand Storage Systems IBM Almaden Research Lab A Bit About Me o Research Staff Member at IBM Almaden Research Center for 5 years GPFS, NAS, VMs, HPC o PhD - University of Michigan,

More information

HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS PARALLEL FILE I/O 1

HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS PARALLEL FILE I/O 1 Prof. Thomas Sterling Chirag Dekate Department of Computer Science Louisiana State University March 29 th, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS PARALLEL FILE I/O 1 Topics Introduction

More information