The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Similar documents
Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

PoS(High-pT physics09)036

The CMS Computing Model

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

1. Introduction. Outline

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Batch Services at CERN: Status and Future Evolution

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

Summary of the LHC Computing Review

ATLAS Experiment and GCE

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Overview. About CERN 2 / 11

Grid Computing: dealing with GB/s dataflows

Virtualizing a Batch. University Grid Center

CSCS CERN videoconference CFD applications

High-Energy Physics Data-Storage Challenges

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Big Data Analytics and the LHC

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Storage and I/O requirements of the LHC experiments

Grid Computing at the IIHE

The LHC Computing Grid

CMS Computing Model with Focus on German Tier1 Activities

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Improving Packet Processing Performance of a Memory- Bounded Application

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

Detector Control LHC

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

New Test Results for the ALICE High Level Trigger. Timm M. Steinbeck - Kirchhoff Institute of Physics - University Heidelberg - DPG 2005 HK 41.

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

CERN s Business Computing

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

CERN and Scientific Computing

Study of the viability of a Green Storage for the ALICE-T1. Eduardo Murrieta Técnico Académico: ICN - UNAM

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

IEPSAS-Kosice: experiences in running LCG site

Data Transfers Between LHC Grid Sites Dorian Kcira

CouchDB-based system for data management in a Grid environment Implementation and Experience

Distributed e-infrastructures for data intensive science

Data handling and processing at the LHC experiments

First Experience with LCG. Board of Sponsors 3 rd April 2009

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Grid Computing: dealing with GB/s dataflows

Austrian Federated WLCG Tier-2

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Accelerating Throughput from the LHC to the World

Modeling and Validating Time, Buffering, and Utilization of a Large-Scale, Real-Time Data Acquisition System

Data Reconstruction in Modern Particle Physics

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Visita delegazione ditte italiane

CMS High Level Trigger Timing Measurements

Grid Computing Activities at KIT

b-jet identification at High Level Trigger in CMS

The LHC Computing Grid

Prompt data reconstruction at the ATLAS experiment

Opportunities A Realistic Study of Costs Associated

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Computing for LHC in Germany

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

ATLAS PILE-UP AND OVERLAY SIMULATION

A L I C E Computing Model

Update of the Computing Models of the WLCG and the LHC Experiments

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

LHCb Computing Resources: 2019 requests and reassessment of 2018 requests

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Università degli Studi di Ferrara

First LHCb measurement with data from the LHC Run 2

Andrea Sciabà CERN, Switzerland

T0-T1-T2 networking. Vancouver, 31 August 2009 LHCOPN T0-T1-T2 Working Group

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

GÉANT Mission and Services

The GAP project: GPU applications for High Level Trigger and Medical Imaging

Deferred High Level Trigger in LHCb: A Boost to CPU Resource Utilization

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Real-time Analysis with the ALICE High Level Trigger.

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Status of KISTI Tier2 Center for ALICE

PoS(EPS-HEP2017)523. The CMS trigger in Run 2. Mia Tosi CERN

Data Intensive Science Impact on Networks

First results from the LHCb Vertex Locator

Users and utilization of CERIT-SC infrastructure

The LHC computing model and its evolution. Dr Bob Jones CERN

Experience of the WLCG data management system from the first two years of the LHC data taking

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

PROOF-Condor integration for ATLAS

Parallel Storage Systems for Large-Scale Machines

Computing at Belle II

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Track reconstruction with the CMS tracking detector

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

HTC/HPC Russia-EC. V. Ilyin NRC Kurchatov Institite Moscow State University

An ATCA framework for the upgraded ATLAS read out electronics at the LHC

Transcription:

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM Lukas Nellen ICN-UNAM lukas@nucleares.unam.mx 3rd BigData BigNetworks Conference Puerto Vallarta April 23, 2015

Who Am I? ALICE Mexican coordinator for ALICE data centre project Pierre Auger Observatory Analysis software Data management Collaboration governance Data analysis HAWC observatory Data centre @ ICN Site computing and networking Collaboration governance Data analysis UNAM Super Computing Committee 2

ALICE @ LHC Detector Computing

Exploration of a new energy frontier in p-p and Pb-Pb collisions LHC: 27 km circumference p-p, heavy ions, p-a 14TeV Center of mass energy Bunch collision rate 40MHz CERN: ~10000 scientists 25 July 2014 4

LHC and Experiments Exploration of a new energy frontier in p-p and Pb-Pb collisions LHC: 27 km circumference p-p, heavy ions, p-a 14TeV Center of mass energy Bunch collision rate 40MHz CERN: ~10000 scientists 25 July 2014 4

LHC and Experiments CMS ATLAS General purpose, p-p, heavy ions New physics: Higgs boson, SuperSymmetry Exploration of a new energy frontier in p-p and Pb-Pb collisions LHC: 27 km circumference p-p, heavy ions, p-a 14TeV Center of mass energy Bunch collision rate 40MHz CERN: ~10000 scientists 25 July 2014 4

LHC and Experiments p-p B-Physics, CP Violation (matter-antimatter symmetry) LHCb CMS ATLAS General purpose, p-p, heavy ions New physics: Higgs boson, SuperSymmetry Exploration of a new energy frontier in p-p and Pb-Pb collisions LHC: 27 km circumference p-p, heavy ions, p-a 14TeV Center of mass energy Bunch collision rate 40MHz CERN: ~10000 scientists 25 July 2014 4

LHC and Experiments p-p B-Physics, CP Violation (matter-antimatter symmetry) LHCb CMS ATLAS General purpose, p-p, heavy ions New physics: Higgs boson, SuperSymmetry Exploration of a new energy frontier in p-p and Pb-Pb collisions ALICE LHC: 27 km circumference p-p, heavy ions, p-a 14TeV Center of mass energy Bunch collision rate 40MHz CERN: ~10000 scientists 25 July 2014 Heavy ions, pp Quark-Gluon Plasma (state of matter of early universe) 4

The ALICE collaboration and data flow ~ 1/2 ATLAS, CMS, ~ 2x LHCb 1200 people, 36 countries, 131 Institutes 8 khz (160 GB/sec) level 1 - special hardware 200 Hz (4 GB/sec) level 2 - embedded processors 50 Hz (2.5 GB/sec) level 3 - HLT 50 Hz (1.25 GB/sec) data recording & offline analysis Total weight 10,000t Overall diameter 16.00m Overall length 25m Magnetic Field 0.4Tesla 5

The data challenge in HEP Today: 1 year of LHC data Is ~25 PB 6

ALICE data centres: Tier structure Tier 0 CERN and Wigner Research Center (Hungary) First copy data First pass reconstruction Data distribution to Tier 1 centers Tier 1 13 centres worldwide, 7 for ALICE, none in the Americas Data copy Reconstruction and simulation Data distribution to Tier 2 centers Tier 2 Institutions, Universities: ~160 centers 7

WLCG tier hierarchy 8

The ALICE Grid sites 8 in North America 53 in Europe 10 in Aisa 2 in South America 2 in Africa 9

The ALICE Grid sites 8 in North America 53 in Europe 10 in Aisa Mexico 2 in South America 2 in Africa 9

ALICE computing @ UNAM

Pre-History: ALICE GRID node @ ICN-UNAM Started in 2006 with 32 Xeon cores, 32 bit, 2.4GHz, 1.5GB RAM / core upgraded to 64 bit nodes in 2008 1TB storage element EELA resource centre ALICE VO-box Suffer from high network latency Mixed routing: commodity and CUDI net Low bandwidth Work on improving networking routing TCP stack tuning 11

Networking improvements ALICE computing has been an important driver behind improvements of WAN Second E3 to CUDI in 2008 duplicates UNAM bandwidth to 60Mbit/s New fibre ICN-DGSCA to reach 1Gbit/s and beyond Dedicated HPC network segment @ICN Cluster expanded in 2010 UNAM acquires 1Gb/s to San Antonio Fully operational in April 2013 Main users now: ALICE, HAWC 12

The idea for a Tier-1 data centre October 2010: ALICE contacts the UNAM Recognise previous experience Looking to expand computing resources Opportunity for the UNAM to acquire know-how January 2011: Workshop Grid Computer Center of the Americas defines initial goals 1000 cores 1PB storage Start as a Tier-2 centre, then move up to Tier-1 13

Hardware purchases Original plan: bundle purchase with renewal of UNAM supercomputing (Miztli) Optimising cost and resources: purchase dedicated equipment xrootd/eos storage Ethernet only Use small number of nodes form Miztli to construct prototype Gain experience Start evaluation of network 14

The Canek cluster Hardware arrived in March 2014 512 cores (1024 threads) in 32 nodes 2 Intel Xeon(R) CPU E5-2650 v2 @ 2.60GHz 128GB RAM, 2 1TB local disk 450TB storage in 5 servers 2 Intel(R) Xeon(R) CPU X5650 @ 2.67GHz 12 cores / 24 threads, 24GB RAM 90TB per enclosure RAID 6 10Gbps Ethernet operational internally ready to connect cluster at 10Gbps to the world 15

Current status: Tier 2 Configured as node for ALICE UNAM-CERN MOU for Tier-2 data centre signed in November 2014 Presence of CERN's scientific director in the UNAM Regular operations 16

ALICE computing in North America 17

ALICE computing in North America 17

ALICE computing in North America 17

Hardware expansion towards high-end Tier-2 Plans for 2015 Expand storage Currently about 450TB available Cross the 1PB threshold in 2015 Will need continuous expansion in the future LHC will produce data for 20 years Double processing power Currently 512 Hyper-threaded cores 1024 job slots 4900 HEP-SPEC 06 in the center of the playing field room to move up 18

The Future: ALICE UNAM data centre

A L I C E RUN 2 detector upgrades TPC, TRD readout electronics consolidation +5 TRD modules full azimuthal coverage +1 PHOS calorimeter module + DCAL calorimeter Double event rate => increased capacity of HLT system and DAQ Rate up to 8GB/sec to T0 2 Predrag Buncic Plans for Run 2 & 3

A L I C E Preparations for Run2 Expecting increased event size 25% larger raw event size due to the additional detectors Higher track multiplicity with increased beam energy and event pileup Concentrated effort to improve performance of ALICE reconstruction software Improved TPC-TRD alignment TRD points used in track fit in order to improve momentum resolution for high p T tracks Streamlined calibration procedure Reduced memory requirements during reconstruction and calibration (~500Mb, the resident memory is below 1.6GB and the virtual - below 2.4 GB) 3 Predrag Buncic Plans for Run 2 & 3

A L I C E Run 3: Paradigm shift Now: reducing the event rate from 40 MHz to 1 khz Select the most interesting particle interactions Reduce the data volume to a manageable size After 2018: Higher interaction rate More violent collisions More particles More data (1 TB/s) Physics topics require measurements characterized by very small signal/ background ratio large statistics Large background traditional triggering or filtering techniques very inefficient for most physics channels Read out all particle interactions (PbPb) at the anticipated interaction rate of 50 khz Data rate increase: x100 18 Predrag Buncic The ALICE Online- Offline Project

Why insist on a Tier-1 data centre Needed Challenging Possible 23

The need for a new Tier 1 for ALICE Second copy data storage Tier-1 centres responsible for safe-keeping of raw data More space needed to hold backup copies No Tier-1 data center for ALICE in the Americas Support regional distribution Additional processing power for the collaboration Collaborators are expected to contribute Will be the Mexican contribution in computing 24

Challenges New step in advanced computing in Mexico Data intense science Intense network usage Motor to drive development of infrastructure Network Data centres Attract new users Provide high-level, reliable service High uptime Short response time to problems Under international scrutiny 25

Project possible DGTIC-UNAM experience in providing advanced computing services ICN-UNAM experience in providing grid services Have trained personnel Network infrastructure improving 10Gbps academic networking coming to Mexico Political support for the project 26

Backup system Have to back up data prepared for multi-pb scale Traditionally: Tape Additional technology Not presently in use at the UNAM New possibility: Disk Hierarchical storage Build on local knowledge We could become pioneers of new technology Evaluation options Looking for funding 27

Operational challenges Need short response time More personnel needed Have trained experts Lack operators for routine monitoring and first level attention to problems High uptime expected Stability Spares Maintenance Stricter than for most (all?) academic computing centers in Mexico Budget 28

Occasionally asked questions Why Tier-1 in Mexico Can be done: Proof of Mexico s technological abilities. Each country contributes to the computing power of the collaboration, it is one of the contributions expected from a mature country. Why not use commercial computing, e.g., cloud services? Rule of thumb: running your own installation more economic if you manage to get more than ~75% use. Local installation provides opportunity to train new experts in computing. 29

Local synergy / spin-off HAWC primary data center at ICN-UNAM ~700TB on disk, preparing to reach 2PB installed space Dark Energy Spectroscopic Instrument Mexican collaborators staring Approached us for support Pierre Auger Observatory Grid node, supporting production 30

Conclusions Successfully operating a Tier-2 data centre for ALICE Developing a full Tier-1 data centre is challenging possible Front line projects provide stimulus for development Benefit from experience in High Energy Physics to handle Big Data Expand infrastructure Attract and support new users and communities Challenge for Network Infrastructure Complements traditional Super Computing 31