Cloud du CCIN2P3 pour l' ATLAS VO

Similar documents
Singularity tests at CC-IN2P3 for Atlas

Singularity tests. ADC TIM at CERN 21 Sep E. Vamvakopoulos

Evolution of Cloud Computing in ATLAS

LHCb experience running jobs in virtual machines

Clouds at other sites T2-type computing

The Effect of NUMA Tunings on CPU Performance

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

arxiv: v1 [cs.dc] 7 Apr 2014

Clouds in High Energy Physics

The Wuppertal Tier-2 Center and recent software developments on Job Monitoring for ATLAS

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR

Status of KISTI Tier2 Center for ALICE

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

IN2P3-CC cloud computing (IAAS) status FJPPL Feb 9-11th 2016

Running HEP Workloads on Distributed Clouds

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Global Software Distribution with CernVM-FS

Use of containerisation as an alternative to full virtualisation in grid environments.

Scientific Computing on Emerging Infrastructures. using HTCondor

A comparison of performance between KVM and Docker instances in OpenStack

Using CernVM-FS to deploy Euclid processing S/W on Science Data Centres

GPU on OpenStack for Science

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Scientific Workflows and Cloud Computing. Gideon Juve USC Information Sciences Institute

A Fast Instruction Set Simulator for RISC-V

Status and Roadmap of CernVM

Singularity in CMS. Over a million containers served

The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015

Low overhead virtual machines tracing in a cloud infrastructure

13th International Workshop on Advanced Computing and Analysis Techniques in Physics Research ACAT 2010 Jaipur, India February

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Nested Virtualization and Server Consolidation

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Lightweight scheduling of elastic analysis containers in a competitive cloud environment: a Docked Analysis Facility for ALICE

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

1. What is Cloud Computing (CC)? What are the Pros and Cons of CC? Technologies of CC 27

WLCG Lightweight Sites

GRNET Cloud Services

What is Cloud Computing? Cloud computing is the dynamic delivery of IT resources and capabilities as a Service over the Internet.

Experience with ATLAS MySQL PanDA database service

Integration of Cloud and Grid Middleware at DGRZR

Lightweight Memory Tracing

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

HPC Architectures. Types of resource currently in use

Automatic NUMA Balancing. Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Properly Sizing Processing and Memory for your AWMS Server

DIRAC pilot framework and the DIRAC Workload Management System

UW-ATLAS Experiences with Condor

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Experience with PROOF-Lite in ATLAS data analysis

On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

Data transfer over the wide area network with a large round trip time

New data access with HTTP/WebDAV in the ATLAS experiment

Energy-centric DVFS Controlling Method for Multi-core Platforms

Applying Polling Techniques to QEMU

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Configuring and Benchmarking Open vswitch, DPDK and vhost-user. Pei Zhang ( 张培 ) October 26, 2017

A Case for High Performance Computing with Virtual Machines

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Analytics Platform for ATLAS Computing Services

What s New for Oracle Java Cloud Service. On Oracle Cloud Infrastructure and Oracle Cloud Infrastructure Classic. Topics: Oracle Cloud

Batch Services at CERN: Status and Future Evolution

Andrej Filipčič

CernVM-FS beyond LHC computing

Austrian Federated WLCG Tier-2

Painless switch from proprietary hypervisor to QEMU/KVM. Denis V. Lunev

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

XEN and KVM in INFN production systems and a comparison between them. Riccardo Veraldi Andrea Chierici INFN - CNAF HEPiX Spring 2009

Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Security in the CernVM File System and the Frontier Distributed Database Caching System

Benchmarking the ATLAS software through the Kit Validation engine

Build Cloud like Rackspace with OpenStack Ansible

CERN: LSF and HTCondor Batch Services

Service Availability Monitor tests for ATLAS

GaaS Workload Characterization under NUMA Architecture for Virtualized GPU

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

GRID AND HPC SUPPORT FOR NATIONAL PARTICIPATION IN LARGE-SCALE COLLABORATIONS

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Hardware Sizing Guide OV

ElastiCluster Automated provisioning of computational clusters in the cloud

CernVM - a virtual software appliance for LHC applications

Heterogeneous Grid Computing: Issues and Early Benchmarks

University of Johannesburg South Africa. Stavros Lambropoulos Network Engineer

Cloud Computing for Science

CompTIA CV CompTIA Cloud+ Certification. Download Full Version :

Exploiting Virtualization and Cloud Computing in ATLAS

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

HPC Cloud at SURFsara

Advances of parallel computing. Kirill Bogachev May 2016

Transcription:

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Cloud du CCIN2P3 pour l' ATLAS VO Vamvakopoulos Emmanouil «Rencontres LCG France» IRFU Saclay 1 2 December 2014

VO needs and objective Atlas would like to use oportunistic Cloud Resources Case for Simulation (Single and MultiCore jobs) Deploy Virtual Machines as «Worker Nodes» on a Cloud IAAS (infrastructure as a Service) and Interfaced those VMs with the Grid System 2

VO needs and objective Atlas would like to use oportunistic Cloud Resources Case for Simulation (Single and MultiCore jobs) Deploy Virtual Machines as «Worker Nodes» on a Cloud IAAS (infrastructure as a Service) and Interfaced those VMs with the Grid System 3

AtlasCloud news and Updates AtlasCloudOperations Group atlas adc cloudcomputing@cern.ch Deploy the available, established solutions in order to Bridge the Grid Computing to Cloud Computing Ensure Cloud Resource smooth operation 4

Review : Context Diagram BNL Elastic Cluster https://www.racf.bnl.gov/experiments/usatlas/griddev/autopyfactory BNL/Amazon AWS Pilot 5

AtlasCloud Operation Recommendations Cloud operations Use of CernVM (3.X) Standard Cloud init contexualization templates Publish to Federated Ganglia Monitor Tool (http://agm.cern.ch) Central AutoListener: «CloudScheduler» Not 100% madantory 6

Atlas Ganglia Monitoring http://agm.cern.ch/ 7

WLCG Cloud Usage Resource dashboard Mixted Sources (VO WMS/EGI Apel/Ganglia monitoing) http://cloud acc dev.cern.ch 8

Cloud Resources Intergation at IN2P3 CC IN2P3 CC T3 > ATLAS SITE (Agis) DDM ENDPOINT LESS SITE ( use of the T1's storage (dcache) Panda queue IN2P3 CC T3 VM01 (on line) ANALYSIS_IN2P3 CC T3 VM01 (off line) Seperate view on Historical View or SSB DashBoard 9

BNL Elastic Cluster Solution: APF + condorec2 HC test Framework:User Context Diagram Condor Cluster Panda VO WMS # of Activated Jobs : Ready to dispatch Submit Pilots Running Jobs # running jobs at Condor Cluster according to the # of Activated jobs Submit VMs APF+CondorEc2 10

Benchmark : Atlas MC VMs on CC@Cloud* Atlas MC JOBs Real WNs Intel/Xeon E5540@2.53GHz 16 CPU (hypertheading) 48GB RAM 160GB scratch CERN P1 StressTest mc12 AtlasG4_trf 17.2.2 (512) AtlasProduction/17.2.2 mc12_8tev.175590.herwigpp_pmssm_dstau_m SL_120_M1_000.evgen.EVNT.e1707_tid01212395_ 00_derHCBM Overheads (wallclock) ~ 30 % 16 corrurent job On both partition For ~7 circles *16 VCPU/32GB RAM/160GB Scratch 11

Next Steps Verify BenchMark with latest Hardware Optimization of Memory performance : e.g. Numa topology vs the Size of the Image Deploy CernVm 3.x image solution ( PaaS ) Establish a local fairshare policy Finilized operation details Log and Tracking Activity 12

Merci de Vorte Attetion! 13

Backup Slides 14

Memory performance and Numa Topology Intel Xeon Processor E5540 (8M Cache, 2.53 GHz, 5.86 GT/s Intel QPI) «Non uniform memory access (NUMA) is a computer memory design used in Multiproccesing, where the memory access time depends on the memory location relative to the processor.» Demostration of Non uniform memory access Pattern e.g. numactl --membind 1 --physcpubind 0./ramsmp -b 6 -m 32 -p 1 -l 3 (*) Local --> ~8563.08 MB/s Remote --> ~ 5855.48 MB/s http://vmstudy.blogspot.gr/2010/09/kvm-memorycpu-benchmark-with-numa.html * http://www.alasir.com 15

Benchmark : HepSpec06 Integer Benchmarks 471.omnetpp C++ Discrete Event Simulation 473.astar C++ Path-finding Algorithms 483.xalancbmk C++ XML Processing 0.35 0.3 Overhead (%) 0.25 0.2 Floating Point Benchmarks 0.15 444.namd 447.dealII C++ 450.soplex 453.povray 0.1 0.05 0 C++ Biology / Molecular Dynamics Finite Element Analysis C++ Linear Programming, Optimization C++ Image Ray-tracing Application ( HepSpec Component) Total overheads (on HepSpec06 mark) ~17% The individual overheads (WallClocks) per HepSpec06 component exhibit large dispersion Floatpoint vs integer Application? 16

Image Softwate Stack and Size BoxGrider SL 6.5 HepOS lib CVMFS HTCondor Cloud init / modules CVMFS Condor Ganglia I Vo Profile from CVMFS Site Configuration from CVMFS/Agis Valid Defaults only for CC 32GB RAM, 16VCPU, 160GB scratch space 17

Cern Vm 3 and µcernvm «µcernvm is the heart of the CernVM 3 virtual appliance. It is based on Scientific Linux 6 combined with a custom, virtualization friendly Linux kernel. This image is also fully RPM based; you can use yum and rpm to install additional packages». Small image size ~ 20MB Include cvmfs, condor, ganglia Contexualization : Hepix, EC2,Cloud Init Strip kernel with latest paravirtualized drivers Versioned OS on CERVM FS Support System update (via UnionFS) http://cernvm.cern.ch J Blomer et al.; 2014 J. Phys.: Conf. Ser. 513 032007 "Micro-CernVM: slashing the cost of buil ding and deploying virtual machines 18

KVM test + NUMA [root@ccwntest16 vamvakop]# virsh numatune cloudatlas numa_mode : strict numa_nodeset : 0 [root@ccwntest16 vamvakop]# virsh numatune cloudatlas2 numa_mode : strict numa_nodeset : 1 virsh vcpuinfo cloudatlas grep CPU: paste - - CPU: VCPU: VCPU: VCPU: VCPU: VCPU: VCPU: VCPU: 0 1 2 3 4 5 6 7 CPU: CPU: CPU: CPU: CPU: CPU: CPU: CPU: 0 2 4 6 8 10 12 14 Per-node process memory usage (in MBs) PID Node 0 Node 1 Total ---------------- --------------- --------------- --------------17998 (qemu-kvm) 14710.16 0.01 14710.16 18039 (qemu-kvm) 5.43 14744.73 14750.15 ---------------- --------------- --------------- --------------Total 14715.58 14744.73 29460.32 Cgroup enabled Memory cpuset 19

Tests... MC atlas 12 events 0.4 0.35 0.3 (%) overhead 0.25 0.2 0.15 0.1 0.05 0 HTC16.32 KVM016.32 KVM08.16 configuration 20