Visita delegazione ditte italiane

Similar documents
CERN European Organization for Nuclear Research, 1211 Geneva, CH

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Batch Services at CERN: Status and Future Evolution

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

ATLAS Experiment and GCE

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Overview. Jakub T. Mościcki, IT/DSS. Meeting with Palestinian fellows

Virtualizing a Batch. University Grid Center

CernVM-FS beyond LHC computing

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The LHC Computing Grid

Introducing SUSE Enterprise Storage 5

Data Transfers Between LHC Grid Sites Dorian Kcira

New Approach to Unstructured Data

CERN and Scientific Computing

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Towards Network Awareness in LHC Computing

Data services for LHC computing

The CMS Computing Model

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

Grid Computing: dealing with GB/s dataflows

A product by CloudFounders. Wim Provoost Open vstorage

Hedvig as backup target for Veeam

The storage challenges of virtualized environments

The Fastest And Most Efficient Block Storage Software (SDS)

Storage and I/O requirements of the LHC experiments

Ceph-based storage services for Run2 and beyond

IBM ProtecTIER and Netbackup OpenStorage (OST)

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Storage for HPC, HPDA and Machine Learning (ML)

Streamlining CASTOR to manage the LHC data torrent

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Grid Computing Activities at KIT

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

A fields' Introduction to SUSE Enterprise Storage TUT91098

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

Summary of the LHC Computing Review

Grid Computing a new tool for science

Exploring cloud storage for scien3fic research

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

The ATLAS EventIndex: Full chain deployment and first operation

UW-ATLAS Experiences with Condor

Cloud Storage. Patrick Osborne Director of Product Management. Sam Fineberg Distinguished Technologist.

arxiv: v1 [cs.dc] 20 Jul 2015

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Dimension Data Public Cloud Rate Card

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

High-Energy Physics Data-Storage Challenges

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

PoS(EGICF12-EMITC2)106

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 21 July 2011

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

IBM Storwize V5000 disk system

Clouds in High Energy Physics

HITACHI VIRTUAL STORAGE PLATFORM G SERIES FAMILY MATRIX

Block Storage Service: Status and Performance

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Storage and Storage Access

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

CERNBox. deployment experience and status. Data & Storage Services. CERNBox. Jakub T. Mościcki Massimo Lamanna CERN IT- DSS TNC 2014

Cloud and Storage. Transforming IT with AWS and Zadara. Doug Cliche, Storage Solutions Architect June 5, 2018

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

PracticeTorrent. Latest study torrent with verified answers will facilitate your actual test

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

DAHA AKILLI BĐR DÜNYA ĐÇĐN BĐLGĐ ALTYAPILARIMIZI DEĞĐŞTĐRECEĞĐZ

Next Generation Storage for The Software-Defned World

A Robust, Flexible Platform for Expanding Your Storage without Limits

SurFS Product Description

Testing storage and metadata backends

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Distributed File Storage in Multi-Tenant Clouds using CephFS

VMware Virtual SAN Technology

PROOF-Condor integration for ATLAS

Volunteer Computing at CERN

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

FLASHARRAY//M Business and IT Transformation in 3U

SECURE, FLEXIBLE ON-PREMISE STORAGE WITH EMC SYNCPLICITY AND EMC ISILON

CC-IN2P3: A High Performance Data Center for Research

Sheldon D Paiva, Nimble Storage Nick Furnell, Transform Medical

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne

Clouds at other sites T2-type computing

Renovating your storage infrastructure for Cloud era

SCS Distributed File System Service Proposal

Distributing storage of LHC data - in the nordic countries

Transcription:

Visita delegazione ditte italiane CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Massimo Lamanna/CERN IT department - Data Storage Services group

Innovation in Computing in High-Energy Physics Demanding science Demanding computing Power usage Innovation Web invention Grid computing (LHC Computing Grid) 2

Example:CMS CMS is a general-purpose detector with the same physics goals as ATLAS, but different technical solutions and design. It is built around a huge superconducting solenoid. This takes the form of a cylindrical coil of superconducting cable that will generate a magnetic field of 4 T, about 100 000 times that of the Earth. About 4000 people work for CMS from 182 institutes in 42 countries (May 2013). Detector: 21 m long, 15 high m and 15 m wide; 12 500 t http://cmsinfo.cern.ch/outreach 40 MHz collisions (bunch crossing) Typical data rate (RAW data; pp): 300 MB/s in Run 2 rates will go up! CMS: ~1 GB/s O(100) Hz selected events 3

Reconstruction, analysis and simulation software for the LHC experiments: O(10 7 ) lines of C++ Large sections are experiment specific Written by the community (100s of developers) Physicists needed here Used by 1000s physicists (actively changing the code) Querying/skimming large data Query running custom complex programs 4

CERN Computer Centre CERN computer centre: Built in the 70s on the CERN site (Meyrin-Geneva) ~3000 m 2 (4 main machine rooms) 3.5 MW for equipment Est. PUE ~ 1.6 New extension: Located at Wigner (Budapest) ~1000 m 2 2.7 MW for equipment Connected to the Geneva CC with 2x100Gb links (21 and 24 ms RTT) 5

2014 Overall Numbers ~9 PB written in 2014 (2013: ~10PB) 98.5PB on 305M files Reduced user activity (no LHC data taking) LHC run: 1 PB/week (or more!) Infrastructure: Disk farms CASTOR + EOS 40 PB 80 PB ~1600 disk servers ~PCs with 24 disks Tapes: 7 libraries, 65K slots 50K tapes 141 drives ~14 PB read in 2014 (2013: ~23PB) CPU 100,000 cores 6

Data distribution (e.g. CERN distributes RAW data to Tier1s) (Large) processing campaigns on preplaced data (e.g. Reprocessing) Download small sample for local analysis? but Scale out with grid jobs (user executable dispatched where data are) Using file parallelism (data set list of files list of independent jobs) Federating storages Recall data `on the flight 7

Intercontinental links (data and jobs) AP area: Taipei (Tier1), Tokyo, Beijing, Seul, Melbourne, Mumbai, NorthAmerica: BNL and Fermilab (US Tier1), Victoria (Canada Tier1) and many Tier2 like Stanford, MIT, Wisconsin, Argonne, South America: several Tier2s Africa: few sites as the South Africa Tier2s 8

CMS processing: wall clock consumption Tier0 and Tier1 processing Top: last week Bottom: Oct 2012 (data taking) Sizeable even if no data taking: continuous reprocessing Reconstruction activities RAW reconstructed objects Organised processing Output for physicists analysis They can access RAW data if needed Final analysis more efficient on the files containing the reconstructed objects The grid never sleeps 9

Storage Strategy @ CERN

Two interesting directions Innovation for heavy-duty task EOS Interesting solutions for collaboration CERNBox Close contacts with technology leaders Ceph 11

EOS: Large disk farms for physics and beyond Currently ~25 PB used quota ~60 PB quota (@Dec2014) Developed in CERN/IT (DSS) Original goal Large scale (PBs for 100s/1000s independent scientists) analysis of LHC data Arbitrary level of data durability via cross-node file replication or RAIN using commodity hardware Status Open to non-physics use cases NB: large number of protocols available 12

EOS installed across the CERN computer centres EOS takes advantages of the two CERN computer centres Coping with ~20-ms latency Distributing copies across the two sites for dependability and performance Status As today we are crossing the 30% mark Expect to be at ~ 50% next year New acquisitions once per year (10s of PB, 100s of boxes) adding capacity and replacing obsoleted boxes Bottom line: not at all trivial It is an unique tool Developed for our needs Much broader applications possible 13

Towards large-scale data sync and share Starting point: the classical Dropbox use-case usability and easiness over high-performance Based on OwnCloud Currently deployed CERNBox beta Data in our data centre!!! But can we bring this system to the next level? Our core-business workflows and large-scale workloads expose PBs of existing data from day 1 integration into physics data processing central services: batch, interactive data analysis applications sync higher data volumes at higher rates Can we still keep the simplicity of cloud storage access? Yes: using EOS as a backend Seamless integration of the work environment (mobile devices) and the CERN IT central services (batch and grid) 14

Sync client (webdav) Web access (https) Architecture HTTPS HTTPS LB HTTPS LB HTTPS LB LB Data flow Metadata flow Data directly accessible by the user USER http (public data) https (private data) OC http (internal) KHz metadata ops fuse All sync state as metadata in the storage Files written with USER credentials STORAGE (EOS) IO redirect disk servers (1000s) namespace

~40 participants (from non HEP) Keynote: B.Pierce (Penn Uni) Technology & Users Site reports Vendor talks IBM Powerfolder SeaFile PyDio Owncloud 16

Immediate access to all our data! EOS Spring 2014 17

Ceph @ CERN Probably the most promising cloud storage technology Hand-in-hand with our OpenStack infrastructure Testing began in early 2013; 3PB cluster deployed in August 2013 Our use-cases: OpenStack Cinder volumes to offer persistent, thinly provisioned disks to our VMs OpenStack Glance image repository for system images and VM snapshots Consolidating our NFS and OpenAFS storage services on Ceph block devices Future large scale object stores for physics data We built and maintain a close collaboration with Inktank (now RedHat) Operations experience of one of the largest clusters in the world Invited presentations at Ceph Days in London and Frankfurt Development contributions in the Ceph source from three CERN IT staff

OpenStack with Ceph @ CERN Linear growth with nearly 300TB consumed (incl. 3x replication): Close to 700 volumes consuming >250TB data More than 1000 machine images Ceph is running on our standard physics data servers, which is not tuned for Ceph Augmented with SSDs to improve low latency high IOPS performance. We make frequent contributions of operational experience (and patches) back to the community. Increasing space usage Growing number of Cinder volumes

Future Plans for Ceph @ CERN New instance at the CERN data centre in Budapest in 2015 To offer volume service to VMs in Budapest Redundancy for DR and business continuity Ongoing development for physics data storage on Ceph, with two approaches: Thin storage gateways to adapt our existing storage systems to a Ceph backend Co-hosting our physics service gateways alongside the Ceph OSDs. This minimizes duplicated network traffic. 10PB test planned for 2015 Now that RedHat is the caretaker of Ceph, we re hopeful for a close integration with RHEL7.x which could enabling: Native NFS-like home directory service (e.g. CephFS) Enterprise databases

Conclusions and QA Challenge: make LHC analysis possible IT infrastructure providing services: Dependable Cost effective High performance Large scale international collaborations Not only pure number crunching (or byte storing ) Concurrent access ( high performance applications) Remote access ( cloud computing) Collaborative access ( large user communities) Xavier Cortada (with the participation of physicist Pete Markowitz), "In search of the Higgs boson: H -> ZZ", digital art, 2013. 21