Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Similar documents
Data Transfers Between LHC Grid Sites Dorian Kcira

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Center for Advanced Computing Research

The CMS Computing Model

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Virtualizing a Batch. University Grid Center

UW-ATLAS Experiences with Condor

High Throughput WAN Data Transfer with Hadoop-based Storage

Gigabyte Bandwidth Enables Global Co-Laboratories

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

Towards Network Awareness in LHC Computing

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Summary of the LHC Computing Review

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Data Centers and Cloud Computing. Data Centers

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Grid Computing at the IIHE

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

150 million sensors deliver data. 40 million times per second

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

1. Introduction. Outline

The INFN Tier1. 1. INFN-CNAF, Italy

The LHC Computing Grid

arxiv: v1 [physics.ins-det] 1 Oct 2009

First Experience with LCG. Board of Sponsors 3 rd April 2009

The Oracle Database Appliance I/O and Performance Architecture

Tape Data Storage in Practice Minnesota Supercomputing Institute

arxiv: v1 [cs.dc] 20 Jul 2015

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

ANSE: Advanced Network Services for [LHC] Experiments

Status of KISTI Tier2 Center for ALICE

Users and utilization of CERIT-SC infrastructure

High-Energy Physics Data-Storage Challenges

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

PROOF-Condor integration for ATLAS

Composable Infrastructure for Public Cloud Service Providers

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Linux Clusters for High- Performance Computing: An Introduction

Towards a Strategy for Data Sciences at UW

Storage Optimization with Oracle Database 11g

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

Andrea Sciabà CERN, Switzerland

The ATLAS EventIndex: Full chain deployment and first operation

AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?

Lecture 9: MIMD Architectures

irods usage at CC-IN2P3: a long history

The LHC Computing Grid

Austrian Federated WLCG Tier-2

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Introduction to Grid Computing

Storage Resource Sharing with CASTOR.

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

ALICE Grid Activities in US

Imperial College London. Simon Burbidge 29 Sept 2016

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

SoftNAS Cloud Performance Evaluation on Microsoft Azure

ISTITUTO NAZIONALE DI FISICA NUCLEARE

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Chris Dwan - Bioteam

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

N. Marusov, I. Semenov

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Lecture 9: MIMD Architectures

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

Improved Solutions for I/O Provisioning and Application Acceleration

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Cloud Computing For Researchers

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

The BioHPC Nucleus Cluster & Future Developments

LHC and LSST Use Cases

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

SoftNAS Cloud Performance Evaluation on AWS

Distributing storage of LHC data - in the nordic countries

Travelling securely on the Grid to the origin of the Universe

New Approach to Unstructured Data

CyberStore DSS. Multi Award Winning. Broadberry. CyberStore DSS. Open-E DSS v7 based Storage Appliances. Powering these organisations

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid

EGEE and Interoperation

Take Back Lost Revenue by Activating Virtuozzo Storage Today

The JINR Tier1 Site Simulation for Research and Development Purposes

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Data storage services at KEK/CRC -- status and plan

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Cloud Computing. What is cloud computing. CS 537 Fall 2017

IEPSAS-Kosice: experiences in running LCG site

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Benoit DELAUNAY Benoit DELAUNAY 1

Transcription:

Compact Muon Solenoid: Cyberinfrastructure Solutions Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Computing Demands CMS must provide computing to handle huge data rates and sizes, and strenuous demands on processing. In 2007, expect 225 MB/s raw data rate to tape (>2 PB raw data/year) 40000 ksi2k CPU resources (Intel P4 = ~1 ksi2k) 14000 TB disk (Library of Congress = ~3000 TB) Many different types of data processing required: Event reconstruction: CPU-intensive, large input, larger output First-pass event selection: Less CPU-intensive, large input, smaller output Data analysis: Moderately CPU-intensive, small input, small output Simulations: Extremely CPU-intensive, tiny input, large output Historic computing solution: a large symmetric multiprocessor (SMP) computer, attached to large disk pool, located at a central facility 2

New Solutions for New Times Centralized SMP good for ease of data access and operational economics. But: Physical infrastructure a precious resource -- power, cooling expensive Commodity machines with fewer CPU s a lot cheaper Operational expertise might be distributed Institutions and countries more willing to pay for local computing than central computing elsewhere Tendency to buy for peak load, but not always at peak -- good to share computing with scientists on different schedules Fast networks allow data and jobs to be shipped anywhere! Suggests a different way to do business! 3

A Tiered Solution Experiment Tier 1 Online System IN2P3 Center INFN Center RAL Center Workstations ~PByte/sec ~2.5-10 Gbps ~2.5-10 Gbps Tier 3 Physics data cache InstituteInstitute Tier 2 Institute Tier 0 +1 Each tier has a different responsibility: CERN Center PBs of Disk; Tape Robot FNAL Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier 0: first-pass reconstruction, write data to tape, send copies to Tier 1 Tier 1: data reduction, skimming, bulk re-processing, archiving Institute 0.1 to 10 Gbps Tier 4 CERN/Outside Resource Ratio ~1:2 Tier0/(Σ Tier1)/(Σ Tier2) ~1:1:1 ~100-1500 MBytes/sec UNL 2.5-10 Gbps Tens of Petabytes by 2007-8. An Exabyte ~5-7 Years later. Tier 2: hosting of smaller samples, data analysis, simulations 4

T2 @ UNL UNL bid to host T2 center built on local strengths: Strong Research Computing Facility Experience in cluster computing Excellent support from Office of Research Robust, growing particle physics research group; large CMS involvement UNL selected as T2 site in January 2005, joining Caltech, MIT, UCSD, Wisconsin, Purdue, Florida; development and operations begin May 2005 Completed first round of hardware purchases CPUs are dual-core Opteron chips, leading-edge technology Installed and commissioned major computing services, including CMS applications, data-transfer utilities, Grid interfaces Acknowledged leader among US CMS T2 sites in integration into tiered computing system First to perform T1 T2 and T2 T1 data transfers 5

First Production UNL T2 Facility Hardware purchases for 2005: 64 dual-core dual-processor Opteron nodes (~315 ksi2k), 20 TB disk server About 20% capacity/complexity of 2007 system 6

Tying Tiers Together I Such a distributed system won t function unless there are technologies to connect all of the computers. Hence, the Grid and middleware. US CMS systems will be part of the Open Science Grid, consortium to develop infrastructure and shared resources for diverse communities of scientists. UNL a member -- researchers of all stripes can share/borrow computing Some needs, and the corresponding middleware tools: Authorization and authentication tools Who is allowed to borrow your CPU? Establish a virtual organization VOMS breaks users into groups, defines roles and priorities; generates credentials signed by trusted authority Generic interfaces to diverse resources GRAM and SRM -- generic interfaces to local batch queue, storage system Can handle load balancing and traffic shaping 7

Tying Tiers Together II Everyone wants to know what s happening on everyone else s cluster: Describing clusters to outside world GLUE collects info about architecture and state, publishes out Monitoring and accounting MonALISA gives real-time info about state of individual clusters and network traffic 8

Moving and Managing Data CMS works with datasets that are much larger than those of typical Grid applications. Need to pay special attention to moving and tracking them. Obviously, BIG network pipes are needed! CMS expects 2.5-10 Gb/s network connections T1 T2 Allows download of a new 200 TB dataset in a few days Also need to use those pipes effectively -- LambdaStation project coordinates networking and storage facilities Dynamically allocate network pipes for specific periods of time, optimize data blasts How do you know where the data are? Can simplify by assigning particular datasets to particular centers; not so helpful for user data that may be in several places. Research underway into efficient cataloguing. Keep track of immutable collections of files instead of files, don t move anything less than ~TB. 9

T2 Ramp-up and Challenges CMS T2 project plan shows steep ramp-up of computing resources: 2005: ~200 ksi2k CPU, ~20 TB disk 2006: ~400 ksi2k CPU, ~70 TB disk 2007: 800 ksi2k CPU, 200 TB disk, 2.5 Gb network Where are the challenges? Not the CPU -- RCF knows how to handle that Disk will be a challenge -- 200 TB can be operated, but needs lots of care and feeding by humans. Don t have the regional network capabilities yet, but expect (hope?) that it will come due to increased demand over time. Still much work to do in developing Grid middleware and making it robust -- not as easy as telnet yet, can t install and walk away. Need skilled labor to make it all go -- CMS has provided funds for 2 FTE system administrators for each site. Labor is biggest cost! 10

What is Cyberinfrastructure? It s physical infrastructure: Power and cooling; Avery, M&P, WSEC, South Stadium It s computing power: Faster processors, dual-core systems, 64-bit architecture It s storage capacity: 200 TB for T2 possibly the largest disk server in Nebraska It s network infrastructure: Huge datasets must be shipped around the world It s software and middleware: CMS-specific applications, plus tools for functional, robust Grid It s human resources: Still in intense R&D period; skilled labor is always in demand The success of the CMS physics program depends on the successful development, integration and interaction of our cyberinfrastructure! 11