How do Design a Cluster

Similar documents
Optimizing the Data Center with an End to End Solutions Approach

How to Pick SQL Server Hardware

Effective Use of CSAIL Storage

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Linux Clusters Institute: Introduction to High Performance Computing

SGI Origin 400. The Integrated Workgroup Blade System Optimized for SME Workflows

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Cisco UCS C200 M2 High-Density Rack-Mount Server

Switch to Parallels Remote Application Server and Save 60% Compared to Citrix XenApp

Organizational Update: December 2015

Updating the HPC Bill Punch, Director HPCC Nov 17, 2017

Introduction. Published in IOUG Select Magazine

High performance Ethernet networking MEW24. High performance Ethernet networking

The Cirrus Research Computing Cloud

IBM BladeCenter S Competitive Summary

Tape Data Storage in Practice Minnesota Supercomputing Institute

Note: You ll need to provide this document to your accounting department.

Quotations invited. 2. The supplied hardware should have 5 years comprehensive onsite warranty (24 x 7 call logging) from OEM directly.

Supercomputing in Plain English

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Nimble Storage vs HPE 3PAR: A Comparison Snapshot

Cisco UCS C210 M2 General-Purpose Rack-Mount Server

What You Need to Know When Buying a New Computer JackaboutComputers.com

Servers & Developers. Julian Nadeau Production Engineer

When Milliseconds Matter. The Definitive Buying Guide to Network Services for Healthcare Organizations

Discover How to Watch the Mass Ascension of the Albuquerque International Balloon Fiesta Even if You Can t Be There

Take control of storage performance

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

LBRN - HPC systems : CCT, LSU

Linux Clusters for High- Performance Computing: An Introduction

The BioHPC Nucleus Cluster & Future Developments

Got Isilon? Need IOPS? Get Avere.

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Energy Management of MapReduce Clusters. Jan Pohland

Finding the pain. Delivering the solution.

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

CONVERSION TRACKING PIXEL GUIDE

Improved Solutions for I/O Provisioning and Application Acceleration

A NETWORK PRIMER. An introduction to some fundamental networking concepts and the benefits of using LANtastic.

GOING FULLY SERVERLESS

You Probably DO Need RAC

Cisco UCS C210 M1 General-Purpose Rack-Mount Server

Before you dive into learning how to use Sage Timeslips, performing a

(software agnostic) Computational Considerations

Comet Virtualization Code & Design Sprint

FLASHARRAY//M Smart Storage for Cloud IT

University at Buffalo Center for Computational Research

FLASHARRAY//M Business and IT Transformation in 3U

Best practices to achieve optimal memory allocation and remote desktop user experience

Multi-Screen Computer Buyers Guide. // //

Three OPTIMIZING. Your System for Photoshop. Tuning for Performance

Web Host. Choosing a. for Your WordPress Site. What is web hosting, and why do you need it?

Server Success Checklist: 20 Features Your New Hardware Must Have

UCS M-Series + Citrix XenApp Optimizing high density XenApp deployment at Scale

Purchasing Services AOC East Fowler Avenue Tampa, Florida (813) Web Address:

Cisco UCS Mini Software-Defined Storage with StorMagic SvSAN for Remote Offices

Business Phone System Buyer s Guide

Designing a System. We have lots of tools Tools are rarely interesting by themselves Let s design a system... Steven M. Bellovin April 10,

Designing a Cluster for a Small Research Group

System Structure. Steven M. Bellovin December 14,

Vblock Architecture. Andrew Smallridge DC Technology Solutions Architect

Cloud Computing. What is cloud computing. CS 537 Fall 2017

When the class is over, the students will - be able to figure out what storage they need based on measurements at their home institution - know what

COMP390 (Design &) Implementation

Request for Proposals (RFP) IT Infrastructure Upgrade & Storage Solution January 26, 2017 Proposal Deadline: February 8, 2017

COMPARING COST MODELS - DETAILS

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Identifying Workloads for the Cloud

The Benefits of SMS as a Marketing and Communications Channel From The Chat Bubble written by Michael

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

ELASTIC CLOUD STORAGE (ECS) APPLIANCE

Search Lesson Outline

Subject: Request for proposal for a high-performance computing cluster

Benefits of Maintaining Computer Systems

erequest for Beginners Contents Introduction to erequests Enter New erequest Approving Searching

Leveraging Traditional Technologies in Non-Traditional Ways

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

BrainPad Accelerates Multiple Web Analytics Systems with Fusion iomemory Solutions

Cluster-Level Google How we use Colossus to improve storage efficiency

Free up rack space by replacing old servers and storage

Windows Servers In Microsoft Azure

The. ULTIMATE GUIDE to Buying a Server

2016 All Rights Reserved

COMP390 (Design &) Implementation

Specifying Storage Servers for IP security applications

Office Move Checklist

Identifying and Eliminating Backup System Bottlenecks: Taking Your Existing Backup System to the Next Level

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

Participants. Results & Recommendations. Summary of Findings from User Study Round 3. Overall. Dashboard

IBM Storwize V7000 Unified

INTRODUCTION TO THE CLUSTER

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

3331 Quantifying the value proposition of blade systems

Data Centers and Cloud Computing

Introducing Panasas ActiveStor 14

TOP DEVELOPERS MINDSET. All About the 5 Things You Don t Know.

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Clickbank Domination Presents. A case study by Devin Zander. A look into how absolutely easy internet marketing is. Money Mindset Page 1

Transcription:

How do Design a Cluster Dana Brunson Asst. VP for Research Cyberinfrastructure Director, Adjunct Assoc. Professor, CS & Math Depts. Oklahoma State University http://hpcc.okstate.edu

It depends. -- Henry Neeman 2

What is a Cluster? [W]hat a ship is It's not just a keel and hull and a deck and sails. That's what a ship needs. But what a ship is... is freedom. Captain Jack Sparrow Pirates of the Caribbean Credit: Henry Neeman

What a Cluster is A cluster needs of a collection of small computers, called nodes, hooked together by an interconnection network (or interconnect for short). It also needs software that allows the nodes to communicate over the interconnect. But what a cluster is is all of these components working together as if they re one big computer... a super computer. Credit: Henry Neeman

An Actual Cluster Also named Boomer, in service 2002-5. Interconnect Nodes Credit: Henry Neeman 5

Considerations Your budget Power, space, and cooling Your researchers workload Your funding cycle Your staff What else? 6

OSU s considerations Budget ~ $1.3M Building out new power & cooling Workload is mix of single core, shared memory and jobs up to ~512 cores Funding cycle is one big purchase every 4-ish years (thanks NSF!) Staff = 1 dedicated person for combined sysadmin, user support, application installation. 7

Components overview Compute Nodes (standard, large memory, accelerated) Storage (slow & fast) Interconnects (slow & fast) Login nodes Management nodes Other? (Data transfer node, web interfaces, etc.) 8

Compute nodes Standard compute nodes Processor type RAM (speed, # channels to fill) Cheap as possible and still make users happy Special compute nodes Large memory Accelerators Choices depends on your users needs and the sweet spot in pricing. 9

OSU s latest basic outline: Compute nodes Compute Nodes: (depends on sweet spot of pricing) Standard: processor 2620 or better, 32-64 GB RAM Large memory: One 1 TB RAM, 4x 256 GB GPU nodes (specs will come from the researcher wanting them.) We already have Xeon Phi from recent purchase 10

Storage Scratch: Do you need a parallel filesystem? Needs staff and/or great support (expensive) Size depends on workload and purge policy Home -- small and simple and is often backed up. Work big, not too slow. Archive PetaStore (what do others do?) 11

OSU s latest basic outline: Storage Home: ~20TB storage appliance x3 (we want this redundant-ish) Scratch: 100TB appliance with 800 number on it (unclear if we can get it this small.) Work: 1 PB (servers full of disks, NFS, RAID6, cheap and simple.) 12

Interconnects Gigabit Ethernet workhorse Infiniband/Omnipath Depends on workload (oversubscription can save money if your workload doesn t have many large parallel jobs.) IPMI network for out-of-band mgmt Worth the small expense 13

OSU s latest basic outline: Infiniband/Omnipath: Interconnects as cheap as we can get it Highly oversubscribed all our parallel jobs fit within a single switch GigE top of rack, uplinked to central 10G IPMI 14

Login & Management Login nodes Get enough to handle all your users >1 can give high availability Round robin DNS Management nodes: Much diversity in how this is done Where you can run all the cluster-wide services Depends on size of cluster, services needed Very small clusters often have a single server for both login and management 15

OSU s latest basic outline: Management 2x login nodes (same proc as compute) 3x mgmt nodes We also want Vendor installation and support (remember 1 FTE dedicated to the technical stuff.)

Other optional bits Data Transfer node (see ESNet for specs) Web interfaces/science gateways, etc. What else? OSU yes to DTN, we already have sufficient webby stuff (aka virtual server pool) 17

Strategy We gave this list to any vendor that wanted to talk to us. We told them the budget The resulting quotes were very informative go over them carefully! We have plenty of time because we re waiting for the new power and cooling to be installed. We will go out for bid

Watch out for: Enterprise vendors have a completely different mindset: uptime, redundancy, etc. The level of redundancy on a cluster is the compute node. Get references from someone who bought from that vendor a similarly sized cluster who has a similarly sized staff Compare notes with as many people as possible

Be warned: No matter what you do, the minute you send out the PO you ll think of something you should ve done differently. And it s okay, we all feel that way. 20

Topics from Etherpad Advertising specs casual or RFP? Scheduler & policies what meets your researchers needs? Provisioning mgmt. system Rocks, xcat, openhpc, razor, puppet, vendor supplied, etc. Replacement schedule What s your funding cycle like? 21

Topics from etherpad Single system vs two? we keep our old system around a while to ease transition Liquid cooling? $$$$ Voltage Depends on how much and what you have available UPS > 100KW are usually (only?) 3 phase 408V Rack standards Space available? Density of cooling available? 22

Acquisition start to finish (was part 2 Get money plan) Spec system based on what s needed Get bids (informally or formally) Buy Don t sign acceptance before you test everything with at least some of your workload 23

Thanks! Questions? Dana Brunson dana.brunson@okstate.edu OSU http://hpcc.okstate.edu 24