Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows
|
|
- Ada Butler
- 5 years ago
- Views:
Transcription
1 Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows J. Bernard 1, P. Morjan 2, B. Hagley 3, F. Delalondre 1, F. Schürmann 1, B. Fitch 4, A. Curioni 5 1 Blue Brain Project (BBP), Geneva, Switzerland 2 IBM, Böblingen, Germany 3 Swiss National Computing Center (CSCS), Lugano, Switzerland 4 IBM, Yorktown Heights, NY, USA 5 IBM, Zurich, Switzerland
2 Outline Why do we need storage-class memory system? Blue Brain Project hardware system design Why do we need system management automation? First implementation supporting application user-defined system configuration
3 Example complex workflow Volume Renderer uses Field Voxelization Report Reader uses Visualization Cluster reads events from reads events from Key-Value Store GPFS Nodes writes writes HPC Simulation Compute BGAS Nodes
4 Why use storage-class memory? Multi-step, complex workflows requiring lots of effort from scientific application user/developer Building brain tissue model Simulating electrical evolution Analysis of simulation results Visualization Brain modeling requires large memory footprint Rat brain about 100 TB Estimate for human brain is 100 PB DRAM not enough cost effective Requires memory hierarchy
5 Outline Why do we need storage-class memory system? Blue Brain Project hardware system design Why do we need system management automation? First implementation supporting application user-defined system configuration
6 BBP resources at CSCS Blue Gene/Q Blue Gene/Q I/O nodes GSS storage cluster System Overview 4 racks of compute nodes: 8 midplanes & 4096 nodes 8 BG/Q Production I/O drawers (64 nodes) 8 BGAS I/O drawers (64 nodes) GSS storage cluster x86 compute cluster Viz x86 compute cluster Distributed management CSCS storage team CSCS BG team BBP HPC & infrastructure team BGAS I/O nodes
7 BGAS I/O nodes compared to standard IONs
8 BGAS I/O nodes: hardware PCIe 2.0 x8 Infiniband replaced by 10 GbE optical cables between drawers <2,2,2> torus extended to <4,4,4> potentially expandable to <8,8,8> 2 TiB SLC flash
9 BGAS I/O nodes: NVM user interfaces Direct storage access (DSA) OFED RDMA verbs provider TiB applications need to be modified Block devices based on DSA Ext4 block device TiB overhead, but POSIX GPFS TiB NSDs communicate over iwarp overhead, no data locality worries, POSIX
10 HS4 flash card partitioning 1.4 TiB usable capacity GPFS flash partition 0 block device EXT4 flash partition 1 DSA 2 TiB 0.6 TiB for wear-leveling 1.4 TiB usable Native DSA interface Verbs block device (VBD) on top of DSA for block access GPFS, EXT4, DSA partitions can all be from 0 to 100% of the usable capacity
11 Outline Why do we need storage-class memory system? Blue Brain Project hardware system design Why do we need system management automation? First implementation supporting application user-defined system configuration
12 Why do we need automation? Highly configurable system supporting different memory interfaces (GPFS, SKV) Automated partitioning based on user requirements for fast application prototyping
13 Why do we need automation? Highly configurable system supporting different memory interfaces (GPFS, SKV) Automated partitioning based on user requirements for fast application prototyping circuit building simulation analysis, visualization
14 Why do we need automation? Highly configurable system supporting different memory interfaces (GPFS, SKV) Automated partitioning based on user requirements for fast application prototyping circuit building SLURM queue simulation SLURM queue analysis, visualization SLURM queue
15 Why do we need automation? Highly configurable system supporting different memory interfaces (GPFS, SKV) Automated partitioning based on user requirements for fast application prototyping circuit building SLURM queue SKV GPFS ext4 simulation SLURM queue analysis, visualization SLURM queue 15
16 What do we want to automate? System Software Management New major release Release update 2-Level Partitioning Cluster partitioning On node-flash memory partitioning Integration with rest of the eco-system (Blue Gene/Q, x86 cluster, GSS storage)
17 System maintenance & update BGAS sandbox creation workflow (major release) build sandbox create ramdisk add RPMs compile GPFS kernel module install Soft-iWARP Integrate with other services (cp config files to sandbox) ssh kerberos SLURM environment modules
18 Shell access to BGAS I/O nodes SSH with Kerberos from any other BBP user node add /etc/krb5.conf, /etc/krb5.keytab to sandbox DNS and /etc/hosts sshd must return FQDN consistent across all user-accessible nodes Viz cluster BGAS nodes BG/Q and BGAS front end nodes BBP desktops Limit access to users with running jobs when fully productionized
19 User-defined configuration parameters List of basic configuration parameters at partitioning time How many clusters: 1 to 8 BGAS clusters How many nodes per cluster: 8 to 64 nodes How much flash allocated for DSA: 0 to 100% How much flash allocated to local ext4: 0 to 100% How much flash allocated to GPFS: 0 to 100% Advanced configuration (Only for GPFS) From 1 to 8 GB GPFS page pool From 64KB to 4MB GPFS block size
20 Overview of partitioning workflow On service node Free block Boot block On all I/O nodes in the block Partition flash Partition & set up ext4 On first node of each block Set up GPFS Integration: Grant remote access to Viz cluster Integration: Set up remote mounts from GSS cluster
21 Partitioning BGAS nodes into I/O blocks I4-32 I6-16 I0-64 I0-32 I0-48 etc. 15 possible I/O blocks connected via <4,4,4> 3D torus 8 drawer/64 node: drawer/32 node: 0-3, drawer/16 node: 0-1, 2-3, 4-5, drawer/08 node: 0, 1, 2, 3, 4, 5, 6, 7
22 Compute node partitions I6-16 I4-32 I0-64 I0-32 I0-48
23 BGAS GPFS cluster creation Clusters identified & authenticated using Cluster name Automatically generated cluster ID Automatically generated SSL certificate to authenticate the name & ID
24 BGAS GPFS remote cluster access Integrating with rest of the system Remote access requires Key generation Certificate exchange Mmauth on server cluster Mmremote* on client cluster But All of these require root mmauth update? Depends on cluster ID not changing
25 BGAS GPFS cluster creation, step 1 Integrating with rest of the system Set up 15 clusters One for each possible BGAS I/O block Extract and save cluster names, IDs, certificates Exchange certs with GSS and Viz cluster admins GSS cluster authorizes each BGAS cluster Each BGAS cluster authorizes mounts by Viz cluster Viz cluster adds each cluster & its file system Delete the clusters (FD: What did you want to say here?)
26 BGAS GPFS cluster creation, step 2 Integrating with rest of the system mmcrcluster mmauth genkey new <TOTALLY UNSUPPORTED> cp af $CERT_DIR/* /var/mmfs/ssl new_cluster_id=$(mmlsconfig clusterid ) sed s/old_cluster_id/new_cluster_id/ mmfs.cfg mmauth genkey propagate </TOTALLY UNSUPPORTED>
27 Sharing scripts and public certificates Integrating with rest of the system git repo hosted at EPFL commit access for BBP and CSCS automated checkout by puppet on Viz cluster checkout by non-root user with read-only access
28 Mounting BGAS GPFS on Viz cluster Integrating with rest of the system BGAS GFPS file systems come and go Viz nodes get rebooted when a BGAS file system is created touch a status file on a GSS file system every N minutes check the status file if mtime < N, or uptime < N mmlsfs && mmmount mmumount non-root admin user via sudo
29 Repartitioning performance boot time(sec) 1000 partition time(sec)
30 Outline Why do we need storage-class memory system? Blue Brain Project hardware system design Why do we need system management automation? First implementation supporting application userdefined system configuration
31 User experience expected workflow Configuring BGAS Configuration of BGAS according to multiple teams needs (Multi-tendency) Configuring a BGAS cluster according to one team s needs Using BGAS for fast scientific development From IBM Blue Gene/Q From Viz cluster From BGAS itself as regular cluster
32 Expected user development cycle Configuring BGAS Super-User (Manager, PI) Decides how cluster should be partitioned based on several teams needs (few weeks) Team developers decide how they want flash of their cluster partitioned at job submission time (few days)
33 Using BGAS from Blue Gene/Q: switching I/O links automatically BGAS queue seen as regular queue Jobs will run on cnk nodes I/O will be routed automatically to BGAS nodes instead of production I/O nodes $ sinfo PARTITION AVAIL JOB_SIZE TIMELIMIT CPUS S:C:T NODES STATE MIDPLANELIST debug* down :00:00 8K 512: idle bgq1011 test up :00:00 8K 512: allocated bgq1001 prod up 512-2K 7-00:00:00 8K 512: drained bgq1000 prod up 512-2K 7-00:00:00 8K 512: K allocated bgq[0000x0011,1001] prod-large up 1-4K 2-12:00:00 8K 512: drained bgq1000 prod-large up 1-4K 2-12:00:00 8K 512: K allocated bgq[0000x0011,1001] prod-large up 1-4K 2-12:00:00 8K 512: idle bgq[1010x1011] bgas up 1-4K 1-00:00:00 8K 512: drained bgq1000 bgas up 1-4K 1-00:00:00 8K 512: K allocated bgq[0000x0001,1001] bgas up 1-4K 1-00:00:00 8K 512:16 2K idle bgq[0010x1011]
34 Using BGAS from Blue Gene/Q: switching I/O links automatically Switching between I/O nodes Compute nodes cabled to both sets of IONs Only one link can be active Compute nodes need to be rebooted SLURM prolog Check partition for bgas Are requested BGAS IONs already linked? If not, deallocate compute nodes, switch links Restart job, boot compute nodes Leave links in place to minimize rebooting
35 Using BGAS as an independent cluster Log in to BGAS front end node Use the queue of your BGAS cluster All created queues visible, only created cluster queues up $ sinfo PARTITION AVAIL JOB_SIZE TIMELIMIT CPUS S:C:T NODES STATE NODELIST bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 8 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 16 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 16 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 16 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 16 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 32 idle bbpbgas[ ] bgas down :00:00 1 1:12:1 32 idle bbpbgas[ ] bgas up :00:00 1 1:12:1 64 idle bbpbgas[ ]
36 GPFS IOR performance, reads, MiB/node proc/node, read 2 proc/node, read 4 proc/node, read RDMA over roq (iwarp) interface, 1MB blocks
37 GPFS IOR performance, writes, MiB/node proc/node, write 2 proc/node, write 4 proc/node, write RDMA over roq (iwarp) interface, 1MB blocks
38 Next steps Further integration to increase automation Integration of BGAS cluster partitioning with SLURM Integration of flash partitioning with SLURM Complete integration of all services Getting user feedback & experience Enhancement & addition of new services Performance benchmarking of data store interfaces (SKV, ) Automated data transfer/copy between BGAS & GSS Multicluster allocation via co-scheduling to support complex workflow execution
Enabling web-based interactive notebooks on geographically distributed HPC resources. Alexandre Beche
Enabling web-based interactive notebooks on geographically distributed HPC resources Alexandre Beche Outlines 1. Context 2. Interactive notebook running on cluster(s) 3. Advanced
More informationAutomated Verifica/on of I/O Performance. F. Delalondre, M. Baerstchi. EPFL/Blue Brain Project - confiden6al
Automated Verifica/on of I/O Performance F. Delalondre, M. Baerstchi Requirements Support Scien6sts Crea6vity Minimize Development 6me Maximize applica6on performance Performance Analysis System Performance
More informationStep 3: Select This computer is part of business network,., click next
GPFS Installation Notes 1. Install Windows 2. Join domain with the domain administrator account Step 1: go to System Step 2: Click on Network ID Step 3: Select This computer is part of business network,.,
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationThe RAMDISK Storage Accelerator
The RAMDISK Storage Accelerator A Method of Accelerating I/O Performance on HPC Systems Using RAMDISKs Tim Wickberg, Christopher D. Carothers wickbt@rpi.edu, chrisc@cs.rpi.edu Rensselaer Polytechnic Institute
More informationParallel I/O on JUQUEEN
Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O
More informationOperational Robustness of Accelerator Aware MPI
Operational Robustness of Accelerator Aware MPI Sadaf Alam Swiss National Supercomputing Centre (CSSC) Switzerland 2nd Annual MVAPICH User Group (MUG) Meeting, 2014 Computing Systems @ CSCS http://www.cscs.ch/computers
More informationI/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings
Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing
More informationBlue Gene/Q User Workshop. User Environment & Job submission
Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationDELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)
DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster
More informationFeedback on BeeGFS. A Parallel File System for High Performance Computing
Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December
More informationAn ESS implementation in a Tier 1 HPC Centre
An ESS implementation in a Tier 1 HPC Centre Maximising Performance - the NeSI Experience José Higino (NeSI Platforms and NIWA, HPC Systems Engineer) Outline What is NeSI? The National Platforms Framework
More informationSupercomputer and grid infrastructure! in Poland!
Supercomputer and grid infrastructure in Poland Franciszek Rakowski Interdisciplinary Centre for Mathematical and Computational Modelling 12th INCF Nodes Workshop, 16.04.2015 Warsaw, Nencki Institute.
More informationThe Why and How of HPC-Cloud Hybrids with OpenStack
The Why and How of HPC-Cloud Hybrids with OpenStack OpenStack Australia Day Melbourne June, 2017 Lev Lafayette, HPC Support and Training Officer, University of Melbourne lev.lafayette@unimelb.edu.au 1.0
More informationIntroduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU
Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationPART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System
INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE
More informationa. puppet should point to master (i.e., append puppet to line with master in it. Use a text editor like Vim.
Head Node Make sure that you have completed the section on Precursor Steps and Storage. Key parts of that are necessary for you to continue on this. If you have issues, please let an instructor know to
More informationThousands of Linux Installations (and only one administrator)
Thousands of Linux Installations (and only one administrator) A Linux cluster client for the University of Manchester A V Le Blanc I T Services University of Manchester LeBlanc@man.ac.uk Overview Environment
More informationAn Overview of Fujitsu s Lustre Based File System
An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu
More informationComputer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research
Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya
More informationI/O and Scheduling aspects in DEEP-EST
I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh
More informationOutline. March 5, 2012 CIRMMT - McGill University 2
Outline CLUMEQ, Calcul Quebec and Compute Canada Research Support Objectives and Focal Points CLUMEQ Site at McGill ETS Key Specifications and Status CLUMEQ HPC Support Staff at McGill Getting Started
More informationDatabase Level 100. Rohit Rahi November Copyright 2018, Oracle and/or its affiliates. All rights reserved.
Database Level 100 Rohit Rahi November 2018 1 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationI/O Monitoring at JSC, SIONlib & Resiliency
Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationJÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich
JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany
More informationUL HPC Monitoring in practice: why, what, how, where to look
C. Parisot UL HPC Monitoring in practice: why, what, how, where to look 1 / 22 What is HPC? Best Practices Getting Fast & Efficient UL HPC Monitoring in practice: why, what, how, where to look Clément
More informationXcalar Installation Guide
Xcalar Installation Guide Publication date: 2018-03-16 www.xcalar.com Copyright 2018 Xcalar, Inc. All rights reserved. Table of Contents Xcalar installation overview 5 Audience 5 Overview of the Xcalar
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationBeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting
BeeGFS The Parallel Cluster File System Container Workshop ISC 28.7.18 www.beegfs.io July 2018 Marco Merkel VP ww Sales, Consulting HPC & Cognitive Workloads Demand Today Flash Storage HDD Storage Shingled
More informationActive Storage: Exploring a Scalable, Compute-In-Storage Model by Extending the Blue Gene/Q Architecture with Integrated Non-volatile Memory
Active Storage: Exploring a Scalable, Compute-In-Storage Model by Extending the Blue Gene/Q Architecture with Integrated Non-volatile Memory Blake G. Fitch bgf@us.ibm.com Scalable Data-centric Computing
More informationIntroduction to GACRC Teaching Cluster PHYS8602
Introduction to GACRC Teaching Cluster PHYS8602 Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationTechnology evaluation at CSCS including BeeGFS parallel filesystem. Hussein N. Harake CSCS-ETHZ
Technology evaluation at CSCS including BeeGFS parallel filesystem Hussein N. Harake CSCS-ETHZ Agenda CSCS About the Systems Integration (SI) Unit Technology Overview DDN IME DDN WOS OpenStack BeeGFS Case
More informationA WEB-BASED SOLUTION TO VISUALIZE OPERATIONAL MONITORING LINUX CLUSTER FOR THE PROTODUNE DATA QUALITY MONITORING CLUSTER
A WEB-BASED SOLUTION TO VISUALIZE OPERATIONAL MONITORING LINUX CLUSTER FOR THE PROTODUNE DATA QUALITY MONITORING CLUSTER BADISA MOSESANE EP-NU Supervisor: Nektarios Benekos Department: EP-NU Table of Contents
More informationScalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany
Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationIntroduction. Kevin Miles. Paul Henderson. Rick Stillings. Essex Scales. Director of Research Support. Systems Engineer.
Orientation 018 Introduction Paul Henderson Director of Research Support Rick Stillings Systems Engineer Kevin Miles Systems Engineer Essex Scales Systems Engineer Introduction What we do Support department
More informationA Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED
A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and
More informationCS500 SMARTER CLUSTER SUPERCOMPUTERS
CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer
More informationovirt Node November 1, 2011 Mike Burns Alan Pevec Perry Myers ovirt Node 1
ovirt Node November 1, 2011 Mike Burns Alan Pevec Perry Myers ovirt Node 1 Agenda Introduction Architecture Overview Deployment Modes Installation and Configuration Upgrading Configuration Persistence
More informationOBTAINING AN ACCOUNT:
HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to
More informationAgent Teamwork Research Assistant. Progress Report. Prepared by Solomon Lane
Agent Teamwork Research Assistant Progress Report Prepared by Solomon Lane December 2006 Introduction... 3 Environment Overview... 3 Globus Grid...3 PBS Clusters... 3 Grid/Cluster Integration... 4 MPICH-G2...
More informationCSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments
CSinParallel Workshop : An Interactive Learning for Parallel Computing Environments Samantha Foley ssfoley@cs.uwlax.edu http://cs.uwlax.edu/~ssfoley Josh Hursey jjhursey@cs.uwlax.edu http://cs.uwlax.edu/~jjhursey/
More informationToward An Integrated Cluster File System
Toward An Integrated Cluster File System Adrien Lebre February 1 st, 2008 XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 Outline Context Kerrighed and root file
More informationAllowing Users to Run Services at the OLCF with Kubernetes
Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing
More informationRedhat OpenStack 5.0 and PLUMgrid OpenStack Networking Suite 2.0 Installation Hands-on lab guide
Redhat OpenStack 5.0 and PLUMgrid OpenStack Networking Suite 2.0 Installation Hands-on lab guide Oded Nahum Principal Systems Engineer PLUMgrid EMEA November 2014 Page 1 Page 2 Table of Contents Table
More informationThe Leading Parallel Cluster File System
The Leading Parallel Cluster File System www.thinkparq.com www.beegfs.io ABOUT BEEGFS What is BeeGFS BeeGFS (formerly FhGFS) is the leading parallel cluster file system, developed with a strong focus on
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationImproved Solutions for I/O Provisioning and Application Acceleration
1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer
More informationTechnology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ
Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake CSCS-ETHZ Agenda About CSCS About the Systems Integration (SI) Unit Technology Overview DDN IME DDN WOS OpenStack BeeGFS
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationSlurm Birds of a Feather
Slurm Birds of a Feather Tim Wickberg SchedMD SC17 Outline Welcome Roadmap Review of 17.02 release (Februrary 2017) Overview of upcoming 17.11 (November 2017) release Roadmap for 18.08 and beyond Time
More informationExperiences with HP SFS / Lustre in HPC Production
Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre
More informationName: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2
CMU 18-746/15-746 Storage Systems 20 April 2011 Spring 2011 Exam 2 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation
More informationIllinois Proposal Considerations Greg Bauer
- 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and
More informationRHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK
RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)
More informationI Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011
I Tier-3 di CMS-Italia: stato e prospettive Claudio Grandi Workshop CCR GRID 2011 Outline INFN Perugia Tier-3 R&D Computing centre: activities, storage and batch system CMS services: bottlenecks and workarounds
More informationManaging and Deploying GPU Accelerators. ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center
Managing and Deploying GPU Accelerators ADAC17 - Resource Management Stephane Thiell and Kilian Cavalotti Stanford Research Computing Center OUTLINE GPU resources at the SRCC Slurm and GPUs Slurm and GPU
More informationovirt Node June 9, 2012 Mike Burns ovirt Node 1
ovirt Node June 9, 2012 Mike Burns ovirt Node 1 Agenda Introduction Architecture Overview Deployment Modes Installation and Configuration Upgrading Configuration Persistence Future Features Discussion
More informationHabanero Operating Committee. January
Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes
More informationBright Cluster Manager Advanced HPC cluster management made easy. Martijn de Vries CTO Bright Computing
Bright Cluster Manager Advanced HPC cluster management made easy Martijn de Vries CTO Bright Computing About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems
More informationA Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017
A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council Perth, July 31-Aug 01, 2017 http://levlafayette.com Necessary and Sufficient Definitions High Performance Computing: High
More informationSolaris Engineered Systems
Solaris Engineered Systems SPARC SuperCluster Introduction Andy Harrison andy.harrison@oracle.com Engineered Systems, Revenue Product Engineering The following is intended to outline
More informationHow do I patch custom OEM images? Are ESXi patches cumulative? VMworld 2017 Do stateless hosts keep SSH & SSL identities after reboot? With Auto Deplo
SER1963BE Technical Overview of VMware ESXi Host Lifecycle Management with Update Manager, Auto Deploy, and Host Profiles VMworld 2017 Content: Not for publication Eric Gray @eric_gray #VMworld #SER1963BE
More informationTORQUE Resource Manager Quick Start Guide Version
TORQUE Resource Manager Quick Start Guide Version High Performance Computing Center Ferdowsi University of Mashhad http://www.um.ac.ir/hpcc Jan. 2006 1 Contents 1 Introduction 3 1.1 Feature................................
More informationHow to Use a Supercomputer - A Boot Camp
How to Use a Supercomputer - A Boot Camp Shelley Knuth Peter Ruprecht shelley.knuth@colorado.edu peter.ruprecht@colorado.edu www.rc.colorado.edu Outline Today we will discuss: Who Research Computing is
More informationName: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2
CMU 18-746/15-746 Storage Systems 20 April 2011 Spring 2011 Exam 2 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationComet Virtualization Code & Design Sprint
Comet Virtualization Code & Design Sprint SDSC September 23-24 Rick Wagner San Diego Supercomputer Center Meeting Goals Build personal connections between the IU and SDSC members of the Comet team working
More informationSubmitting and running jobs on PlaFRIM2 Redouane Bouchouirbat
Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.
More informationLustre usages and experiences
Lustre usages and experiences at German Climate Computing Centre in Hamburg Carsten Beyer High Performance Computing Center Exclusively for the German Climate Research Limited Company, non-profit Staff:
More informationPower Systems with POWER8 Scale-out Technical Sales Skills V1
Power Systems with POWER8 Scale-out Technical Sales Skills V1 1. An ISV develops Linux based applications in their heterogeneous environment consisting of both IBM Power Systems and x86 servers. They are
More informationData Management. Parallel Filesystems. Dr David Henty HPC Training and Support
Data Management Dr David Henty HPC Training and Support d.henty@epcc.ed.ac.uk +44 131 650 5960 Overview Lecture will cover Why is IO difficult Why is parallel IO even worse Lustre GPFS Performance on ARCHER
More informationHPC Introductory Course - Exercises
HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands
More informationSharePoint 2016 Administrator's Survival Camp
SharePoint 2016 Administrator's Survival Camp Installing and Managing SharePoint 2016 in an On -premises Environment Course Code Audience Format Length Course Description Student Prerequisites SSC2016
More informationMONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT
The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationBRC HPC Services/Savio
BRC HPC Services/Savio Krishna Muriki and Gregory Kurtzer LBNL/BRC kmuriki@berkeley.edu, gmk@lbl.gov SAVIO - The Need Has Been Stated Inception and design was based on a specific need articulated by Eliot
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationIntroduction to GACRC Teaching Cluster
Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders
More informationAutomatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain
Automatic Dependency Management for Scientific Applications on Clusters Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain Where users are Scientist says: "This demo task runs on my
More informationIntroduction to GACRC Teaching Cluster
Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders
More informationC06: Memory Management
CISC 7310X C06: Memory Management Hui Chen Department of Computer & Information Science CUNY Brooklyn College 3/8/2018 CUNY Brooklyn College 1 Outline Recap & issues Project 1 feedback Memory management:
More informationLHConCRAY. Acceptance Tests 2017 Run4 System Report Miguel Gila, CSCS August 03, 2017
LHConCRAY Acceptance Tests 2017 Run4 System Report Miguel Gila, CSCS August 03, 2017 Table of Contents 1. Changes since Run2/3 2. DataWarp 3. Current configuration 4. System statistics 5. Next steps LHConCRAY
More informationIntroduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose
Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer Daniel Yorgov Department of Mathematical & Statistical Sciences, University of Colorado Denver
More informationThe TinyHPC Cluster. Mukarram Ahmad. Abstract
The TinyHPC Cluster Mukarram Ahmad Abstract TinyHPC is a beowulf class high performance computing cluster with a minor physical footprint yet significant computational capacity. The system is of the shared
More informationOverview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin
Overview and Introduction to Scientific Visualization Texas Advanced Computing Center The University of Texas at Austin Scientific Visualization The purpose of computing is insight not numbers. -- R. W.
More informationAnalyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016
Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis
More informationSlurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC https://www.schedmd.
Slurm Roadmap Danny Auble, Morris Jette, Tim Wickberg SchedMD Slurm User Group Meeting 2017 HPCWire apparently does awards? Best HPC Cluster Solution or Technology https://www.hpcwire.com/2017-annual-hpcwire-readers-choice-awards/
More informationParallel File Systems Compared
Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features
More informationg-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.
g-eclipse A Framework for Accessing Grid Infrastructures Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.cy) EGEE Training the Trainers May 6 th, 2009 Outline Grid Reality The Problem g-eclipse
More informationUnit 2: Manage Files Graphically with Nautilus Objective: Manage files graphically and access remote systems with Nautilus
Linux system administrator-i Unit 1: Get Started with the GNOME Graphical Desktop Objective: Get started with GNOME and edit text files with gedit Unit 2: Manage Files Graphically with Nautilus Objective:
More informationIBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report
More information