Virtual Machine Provisioning and Performance for Scientific Computing

Similar documents
PoS(ISGC 2011 & OGF 31)049

Some thoughts on the evolution of Grid and Cloud computing

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

RHEV in the weeds - special sauce! Marc Skinner

Red Hat Enterprise Virtualization Hypervisor Roadmap. Bhavna Sarathy Senior Technology Product Manager, Red Hat

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

VMWARE SOLUTIONS AND THE DATACENTER. Fredric Linder

SAFEGUARDING YOUR VIRTUALIZED RESOURCES ON THE CLOUD. May 2012

Microsoft Windows Embedded Server Overview

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

XEN and KVM in INFN production systems and a comparison between them. Riccardo Veraldi Andrea Chierici INFN - CNAF HEPiX Spring 2009

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

The INFN Tier1. 1. INFN-CNAF, Italy

CERN: LSF and HTCondor Batch Services

The IBM Platform Computing HPC Cloud Service. Solution Overview

Network Virtualization

Xen and CloudStack. Ewan Mellor. Director, Engineering, Open-source Cloud Platforms Citrix Systems

Securing Containers Using a PNSC and a Cisco VSG

Database Level 100. Rohit Rahi November Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Securing Containers Using a PNSC and a Cisco VSG

Chapter 3 Virtualization Model for Cloud Computing Environment

Welcome to Manila: An OpenStack File Share Service. May 14 th, 2014

Quantum, network services for Openstack. Salvatore Orlando Openstack Quantum core developer

Virtual Networks: Host Perspective

Networking in Virtual Infrastructure and Future Internet. NCHC Jen-Wei Hu

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

Citrix CloudPlatform (powered by Apache CloudStack) Version 4.5 Concepts Guide

An Introduction to GPFS

Performance and Scalability of Server Consolidation

Data Path acceleration techniques in a NFV world

COP Cloud Computing. Presented by: Sanketh Beerabbi University of Central Florida

Getting to Know Apache CloudStack

Switch to Parallels Remote Application Server and Save 60% Compared to Citrix XenApp

THE OPEN DATA CENTER FABRIC FOR THE CLOUD

Oracle IaaS, a modern felhő infrastruktúra

Cisco CloudCenter Solution with Cisco ACI: Common Use Cases

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Data Center Virtualization: VirtualWire

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016

Citrix XenServer 7.1 Feature Matrix

Managing VMware ESXi in the Datacenter. Dwarakanath P Rao Sr Partner consultant 6 th November 2008

INDIGO-DataCloud Architectural Overview

Zadara Enterprise Storage in

Windows Azure Services - At Different Levels

OpenNebula on VMware: Cloud Reference Architecture

TEN ESSENTIAL NETWORK VIRTUALIZATION DEFINITIONS

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Hyperconverged Cloud Architecture with OpenNebula and StorPool

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

IT Infrastructure: Poised for Change

Cluster Setup and Distributed File System

Storage Protocol Offload for Virtualized Environments Session 301-F

Cross-Site Virtual Network Provisioning in Cloud and Fog Computing

Oracle Corporation 1

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Red Hat enterprise virtualization 3.0

Configuration Cheat Sheet for the New vsphere Web Client

Modern Data Warehouse The New Approach to Azure BI

vsphere Networking for the Network Admin Jason Nash, Varrow CTO

AZURE CONTAINER INSTANCES

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

Hyper-V Innovations for the SMB. IT Pro Camp, Northwest Florida State College, Niceville, FL, October 5, 2013

Windows Server 2012 Hands- On Camp. Learn What s Hot and New in Windows Server 2012!

ANALYSIS OF VIRTUAL NETWORKS IN DATA CENTERS.

OS = Windows 2012 Name = Hyp1 IP = VM = Domain: Contoso.com IP = os = windows2012

Weiterentwicklung von OpenStack Netzen 25G/50G/100G, FW-Integration, umfassende Einbindung. Alexei Agueev, Systems Engineer

Simplified Multi-Tenancy for Data Driven Personalized Health Research

Executive Summary. Introduction. Test Highlights

Virtual Security Gateway Overview

Why software defined storage matters? Sergey Goncharov Solution Architect, Red Hat

Cloud Sure - Virtual Machines

Catalogic DPX TM 4.3. ECX 2.0 Best Practices for Deployment and Cataloging

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

ONTAP 9 Cluster Administration. Course outline. Authorised Vendor e-learning. Guaranteed To Run. DR Digital Learning. Module 1: ONTAP Overview

Introduction to Virtualization. From NDG In partnership with VMware IT Academy

Experiences with OracleVM 3.3

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

BIG-IP Virtual Edition and Linux KVM: Setup. Version 12.1

Power Systems with POWER8 Scale-out Technical Sales Skills V1

Lenovo ThinkSystem NE Release Notes. For Lenovo Cloud Network Operating System 10.6

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

CloudOpen Europe 2013 SYNNEFO: A COMPLETE CLOUD STACK OVER TECHNICAL LEAD, SYNNEFO

Virtualizing Managed Business Services for SoHo/SME Leveraging SDN/NFV and vcpe

F5 BIG IP on IBM Cloud Solution Architecture

Experience the GRID Today with Oracle9i RAC

A High Availability Solution for GRID Services

CS-580K/480K Advanced Topics in Cloud Computing. Network Virtualization

VMware AirWatch Content Gateway for Linux. VMware Workspace ONE UEM 1811 Unified Access Gateway

Persistent Storage with Docker in production - Which solution and why?

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Network Design Considerations for Grid Computing

Developing cloud infrastructure from scratch: the tale of an ISP

Scaling ColdFusion. Presenter Mike Collins, Sr. ColdFusion Consultant - SupportObjective

IBM WebSphere Application Server for Bluemix

Virtual Tech Update Intercloud Fabric. Michael Petersen Systems Engineer, Cisco Denmark

HY436: Network Virtualization

Cost-Effective Virtual Petabytes Storage Pools using MARS. LCA 2018 Presentation by Thomas Schöbel-Theuer

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

Transcription:

Virtual Machine Provisioning and Performance for Scientific Computing Davide Salomoni INFN-CNAF 1

Outline WNoDeS Updates and Cloud Provisioning Performance Tests Network Virtualization 2

WNoDeS Updates (1) More flexibility in VLAN usage E.g., you can dedicate a VLAN to a certain customer, and request that VLAN to be instantiated on certain HVs only. Will be used at CNAF to implement a Tier-3 infrastructure, where Tier-3 users can only run on some hardware, which can be exploited by Tier-1 users as well, if Tier-3 users are not active. Important developments may still be needed on Virtual Networking, see later. libvirt now used to manage and monitor s Tier-1 Tier-1 Tier-1 Tier-1 Tier-1 Tier-1 Tier-3 Tier-3 Tier-3 Tier-1 Tier-3 Tier-1 Tier-3 Tier-1 Either locally of via a Web app (see later). Improved handling of images Automatic purge of old images on HVs. Tags can now be associated to images. Download of images to HVs now via http or Posix I/O. Job migration not needed anymore Enables WNoDeS to be ported to other LRMS. New way to support Cloud s LRMS not needed anymore on Cloud s. 3

WNoDeS Updates (2) Support for L partitioning (see performance measurement) Each will have its L partition; this improves performance and aids in ensuring flexibility (e.g. for Cloud requests) and isolation. Support for sshfs or nfs gateway see performance measurements) Command line tools to manage images New web applications for Cloud provisioning and for WNoDeS monitoring (see later) Virtual Interactive Pools (VIP) Presented at CHEP 10 MD grant working on VIP, starting 4/2011 Support for insertion of states into the DOCET database New plug-in architecture 4

Getting rid of bmig (job migration) bmig (the original way for WNoDeS to move jobs from bait to ) is supported by LSF but not, for instance, by PBS/Torque A serious limit to porting WNoDeS to other LRMS Currently testing a reservation-based alternative: a bait asks its HV to prepare a according to the job requirements as usual; but then the bait makes the pre-exec script intentionally fail - this causes the job to be put back into its queue The bait will also: define a reservation request so that when requeued the running job will be forced to run on its designated only attach this reservation to the requeued job First tests with LSF successful Need to do stress tests under heavy load Note: the Job ID is unaffected (good) A reservation-based mechanism also simplifies the interaction with the LRMS and puts less strain on it INFN Bari is going to test the same machinery with Torque/Maui Thanks to G.Donvito, V.Spinoso 5

Outline WNoDeS Updates and Cloud Provisioning Performance Tests Network Virtualization 6

Provisioning WNoDeS allows full customizations of s I.e., the parameters to define the s are all available to the system Realistically, at the Tier-1, we have decided to characterize s according to a fixed set of parameters This should answer most if not all of the practical request, while at the same time limiting entropy The billing model has to be set up accordingly There is more to this Definition of custom images Through modifications of pre-defined image sandboxes and subsequent saving and retrieval of custom images Storage: dropbox-like (easy), QoSconstrained (less easy) Golden rule: do not over-implement ahead of time ( premature optimization is the rule of all evil, D.Knuth) Specifically, for Cloud requests: Small: 1 core, 1.7 GB RAM, 50 GB HD Medium: 2 cores, 3.5 GB RAM, GB RAM Large: 4 cores, 7 GB RAM, 200 GB RAM Extra-large: 8 cores, 14 GB RAM, 400 GB RAM Current Grid images normally fall into the Small instance. Two further options foreseen, initially for Grid and VIP jobs: Whole-node, hard: all hardware cores, (1.7 * num. cores) GB RAM, (50 * num. cores) GB HD Whole-node, soft: all available cores (with a minimum), (1.7 * num. cores) GB RAM, (50 * num. cores) GB HD Note: network and distributed storage not considered above. 7

Cloud Provisioning The hard way: API-based (e.g., OCCI) Not really meant for direct human consumption and therefore essentially never used directly (at least at the INFN Tier-1) More practically, via a web-based application We don t need yet another portal, though Need to converge around the general concept of resource allocation & utilization Grid, Cloud, or else (i.e. - hopefully a single Grid- or Cloudsubmission/allocation portal) With integrated authentication and authorization Several possibilities here - we d much like to re-use what we already have, though; namely, VOMS and the Argus Authorization Service Plenty of room for collaboration with IGI 8

Cloud Provisioning and WNoDeS Administration Two MD thesis on this, to be discussed soon VOMS/Argus partially integrated into the Cloud portal Selection of Cloud instances according to a few pre-defined configurations ssh key pair to access the allocated s Possibility to instantiate multiple s at once Treemap-based representation of running s for admin purposes Plus details on CPU and I/O utilization As mentioned, plan to integrate this into a more general resource access portal 9

Outline WNoDeS Updates and Cloud Provisioning Performance Tests Network Virtualization 10

Performance test of alternatives to mounting GPFS on s The issue (not strictly GPFS-specific) is that any CPU core may become a GPFS (or any other distributed FS) client. This leads to GPFS clusters of several thousands of nodes This is large, even according to IBM, requires special care and tuning, and may impact performance and functionality of the cluster We investigated two alternatives to this, both assuming that an HV would distributed data to its s sshfs, a FUSE-based solution a GPFS-to-NFS export (GPFS) Hypervisor (no GPFS) (sshfs) GPFS-based Storage (GPFS) (sshfs) (GPFS) (sshfs) Hypervisor ({sshfs,nfs}-to-gpfs) GPFS-based Storage 11

sshfs vs. nfs: throughput sshfs throughput constrained by encryption (even with the lowest possible encryption level) Marked improvement (throughput better than nfs) using sshfs with no encryption through socat, esp. with some tuning File permissions are not straightforward with socat, though - complications with e.g. glexec-based mechanisms Throughput 120 90 98,60 Write Read 101,2 (*) socat options: direct_io, no_readahead, sshfs_sync 112,0 85,29 76,1 MB/s 60 30 40,0 39,5 45,60 54,60 48,90 0 sshfs, socat sshfs, arcfour sshfs, socat + options (*) nfs gpfs GPFS on s (current setup) 12

sshfs vs. nfs: CPU usage 30,0 Write: Hypervisor CPU Load Write: CPU Load 22,5 19,3 17,1 75 39,0 15,0 7,5 0 30,0 12,8 13,6 6,3 4,3 4,3 4,4 sshfs, socat sshfs, arcfour sshfs, socat + options (*) Read: Hypervisor CPU Load nfs usr sys (*) socat options: direct_io, no_readahead, sshfs_sync 50 25 0 90,0 35,1 7,9 sshfs, socat 51,3 46,3 29,3 17,5 sshfs, arcfour 3,6 sshfs, socat + options (*) 2,8 nfs Read: CPU Load 3,8 gpfs GPFS on s (current setup) Write Overall, socatbased sshfs w/ appropriate options seems the best performer 26,6 22,5 15,0 7,5 0 15,1 15,4 6,8 4,3 sshfs, socat sshfs, arcfour 13,6 4,3 sshfs, socat + options (*) 8,0 4,4 nfs 67,5 usr sys 45,0 22,5 0 31,5 17,5 sshfs, socat 55,6 sshfs, arcfour 14,6 9,4 sshfs, socat + options (*) 27,5 14,1 2,3 nfs 4,9 gpfs GPFS on s (current setup) Read 13

-related Performance Tests All tests: since SL6 was not available yet, we used RHEL 6 Classic HEP-Spec06 for CPU performance iozone to test local I/O Network I/O not shown here virtio-net has already been proven to be quite efficient (90% or more of wire speed) Local I/O has historically been a problem for s WNoDeS not an exception, esp. due to its use of the K -snapshot flag The new WNoDeS release will still use -snapshot, but for the root partition only; /tmp and local user data will reside on a (host-based) L partition Several things are improving in the I/O area, though K-specifics: page sharing (KSM), Transparent Huge Pages (test ongoing) Plus network-related optimizations not shown here, namely vhost-net, SR-IOV and vmchannel Note: in our performance test, we disabled (or at least tried to disable) caching 14

HS06 on Hypervisors and s (Intel E5420) Slight performance increase of RHEL6 vs. SL5.5 on the hypervisor Around +3% (exception made for 12 instances: -4%) Performance penalty of SL5.5 s on SL5.5 HV: -2.5% Unexpected performance loss of SL5.5 s on RHEL6 vs. SL5.5 HV (-7%) Test to be completed with multiple s HS06, Intel E5420 HS06, Intel E5420 on HV 14,0 13,5 13,50 13,97 80 60 SL5.5 RHEL6 68,58 70,74 72,52 69,51 HS06 13,0 13,16 HS06 40 47,5548,89 12,5 20 12,0 SL5.5 RHEL6 SL5.5 on SL5.5 12,28 SL5.5 on RHEL6 0 13,5013,97 1 4 8 12 Number of instances 15

iozone on SL5.5 (SL5.5 s) iozone tests with caching disabled, file size 4 GB on s host with SL5.5 taken as reference on SL5.5 with just -snapshot crashed Based on these tests, WNoDeS will support -snapshot for the root partition and a native L partition for / tmp and user data A per- single file or partition would generally perform better, but then we d practically lose instantiation dynamism 22,50% iozone on SL5.5 (reference: host on SL5.5) 0% -22,50% -45,00% vm sl5.5 file vm sl5.5 lvm snap vm sl5.5 nfs snap -67,50% -90,00% write rewrite read reread rand read rand write 16

iozone on RHEL6 (SL5.5 s) write rewrite read reread rand read rand write Consistently with what was seen with some CPU performance tests, iozone on RHEL6 surprisingly performs often worse than on SL5.5 RHEL6 supports native AIO and preadv/pwritev: group together memory areas before reading or writing them. This is maybe the reason for some funny results (unbelievably good performance) of the iozone benchmark. Assuming RHEL6 performance will be improved by RH, using with -snapshot for the root partition and a native L patition for /tmp and user data in WNoDes seems a good choice here as well But we will not upgrade HVs to RHEL6/SL6 until we are able to get reasonable results in this area 800000 600000 400000 iozone on RHEL6 22,50% 0% -22,50% -45,00% iozone, RHEL6 vs. SL5.5 200000-67,50% 0 host rhel6 vm sl5.5 file vm sl5.5 file snap vm sl5.5 file snap aio vm sl5.5 lvm snap aio vm sl5.5 nfs snap -90,00% write rewrite read reread rand read rand write 17

Outline WNoDeS Updates and Cloud Provisioning Performance Tests Network Virtualization 18

A Missing Step: Network Virtualization Server virtualization has progressed steadily in the past years Network virtualization much less so Enterprise networks are often static, locked down, proprietary, complex Key missing features: inter- traffic analysis Support for NetFlow, sflow, SPAN or OpenFlow interface rate limiting, per-port QoS policies ( per-flow management) per-customer VLANs Note: normally up to 4096 VLANs - cf. proposals like RFC5517 or IEEE 802.1ad/802.1ah With private/public IP address assignment dynamic reconfiguration of the network state per- or per- group Network state should become a property of the virtual interface 19

Virtual Plane, 200 300 Physical plane HW Virtual plane L2/L3 200 300 20

Virtual Plane, + Network 200 300 Physical plane HW Virtual plane vnet 200 300 300 L2/L3 999 21

Virtual Network Topology Hypervisor Hypervisor 200 300 300 GRE tunnel or bi-directional channel vswitch 200 vswitch 300 vswitch e300 e e200 dnsmasq Global (Distributed) Virtual Switch Internet Hypervisor Hypervisor 300 22

Virtual Network Topology for Multiple Centers Since the overlay network is served by a virtual switch, nothing prevents to dynamically extend the overlay network to multiple centers E.g. to transparently connect remote resources and make them available for instance for flash requests QoS considerations will play an important role May integrate with more network-centric initiatives like FEDERICA/2 MD thesis on WNoDeS dynamic virtual networking starting 4/2011 Virtual Switch Private VLAN Internet Virtual Switch Private VLAN Private VLAN 200 Site B Private VLAN 200 Internet Site A Virtual Switch Site C 23

Conclusions WNoDeS is evolving, thanks to the experience gained at the Tier-1 Installed and running also here (LNL Tier-2) - thanks esp. to G.Maron, M.Biasotto, A.Crescente. (anybody interested in trying it out is welcome) An important goal is to have it running on LRMS other than LSF Interactions with distributed file systems in large clusters may be complicated (not really a WNoDeSspecific issue) The flexibility of the system is being exploited e.g. by the VIP interface provisioning can have many degrees of freedom, and for the user point of view should really be integrated into a coherent Grid/Cloud portal Performance testing is always interesting, and we are getting better as we understand old and new knobs (CPU, I/O) performance tuning and testing may be quite different from similar conventional activities. Several related things still need to be done (e.g. pinning, brokerage) Network virtualization is perhaps not a big issue with conventional ( Grid ) resource usage, but becomes essential with Cloud-related assignments Still quite an R&D area; good opportunities for collaboration with Network providers The difficulty is not so much in virtualizing (even a large number of) resources. It is much more in having a scalable, extensible, efficient, integrated (with storage, grid, local, cloud interfaces) system. 24

Thanks A. Chierici: performance tuning and test A.K. Calabrese: sshfs vs. nfs vs. GPFS performance tuning and test A. Italiano, G. Dalla Torre: WNoDeS core G. Potena, L. Cestari, D. Andreotti: WNoDeS Cloud interface and Web apps C. Grandi: VIP testing wnodes@lists.infn.it, http://web.infn.it/wnodes 25