Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Similar documents
ASTRONOMY & PARTICLE PHYSICS CLUSTER

Helix Nebula Science Cloud Pre-Commercial Procurement pilot. 5 April 2016 Bob Jones, CERN

EUDAT- Towards a Global Collaborative Data Infrastructure

Travelling securely on the Grid to the origin of the Universe

CC-IN2P3: A High Performance Data Center for Research

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Pre-Commercial Procurement project - HNSciCloud. 20 January 2015 Bob Jones, CERN

e-infrastructure: objectives and strategy in FP7

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Moving e-infrastructure into a new era the FP7 challenge

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

e-infrastructures in Horizon 2020 e-infrastructures for data and computing

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

e-infrastructures in FP7 INFO DAY - Paris

EUDAT & SeaDataCloud

Research Infrastructures in Horizon 2020

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

European Open Science Cloud

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Helix Nebula Science Cloud Pre-Commercial Procurement pilot. 9 March 2016 Bob Jones, CERN

The LHC Computing Grid

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EGI federated e-infrastructure, a building block for the Open Science Commons

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

EGI: Linking digital resources across Eastern Europe for European science and innovation

in Horizon 2020 Philippe Froissard European Commission DG Research & Innovation Research Infrastructures

EGI-Engage: Impact & Results. Advanced Computing for Research

Summary of Data Management Principles

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Striving for efficiency

Open Science Commons: A Participatory Model for the Open Science Cloud

Research Infrastructures and Horizon 2020

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

HPC IN EUROPE. Organisation of public HPC resources

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

The KM3NeT Data Management Plan

High Performance Computing from an EU perspective

The CHAIN-REDS Project

KIC Added-value Manufacturing: Exploiting synergies and complementarities with EU policies and programmes

Some Big Data Challenges

Overview of the CRISP proposal

Developing a social science data platform. Ron Dekker Director CESSDA

GSCB-Workshop on Ground Segment Evolution Strategy

Grid Computing a new tool for science

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

The Virtual Observatory and the IVOA

EUDAT and Cloud Services

European Grid Infrastructure

EuroHPC Bologna 23 Marzo Gabriella Scipione

EUDAT - Open Data Services for Research

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Supporting European e-infrastructure service providers to join the European Open Science Cloud

The EuroHPC strategic initiative

CernVM-FS beyond LHC computing

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Research Infrastructures and Horizon 2020

Grid Computing at the IIHE

STFC Cloud Developments. Ian Collier & Alex Dibbo STFC Rutherford Appleton Laboratory UK Research & Innovation HEPiX Spring, Madison, WI 17 May 2018

EPOS a long term integration plan of research infrastructures for solid Earth Science in Europe

The Digitising European Industry strategy & H2020 calls related to Cyber-Physical Systems

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

Federated Identities and Services: the CHAIN-REDS vision

e-infrastructures in FP7: Call 7 (WP 2010)

EPOS: European Plate Observing System

A European Vision and Plan for a Common Grid Infrastructure

ESFRI Strategic Roadmap & RI Long-term sustainability an EC overview

IT Challenges and Initiatives in Scientific Research

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

The Photon and Neutron Data Initiative PaN-data

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

I data set della ricerca ed il progetto EUDAT

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Data publication and discovery with Globus

N. Marusov, I. Semenov

Federated Services and Data Management in PRACE

The EUDAT Collaborative Data Infrastructure

Open, Digital Science in Europe

Helix Nebula The Science Cloud

EGI Strategy Enabling collaborative data- and compute-intensive science

DT-ICT : Big data solutions for energy

Scientific Computing at SLAC. Amber Boehnlein

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Digitising European industry

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Challenges in the field of Research Infrastructures, ESFRI

Using IKAROS as a data transfer and management utility within the KM3NeT computing model

HPC & Quantum Technologies in Europe

The challenges of (non-)openness:

ESFRI WORKSHOP ON RIs AND EOSC

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

CERN and Scientific Computing

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Accelerating data-driven innovation in Europe

Earth Science Community view on Digital Repositories

Some aspect of research and development in ICT in Bulgaria. Authors Kiril Boyanov and Stefan Dodunekov

Transcription:

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France ERF, Big data & Open data Brussels, 7-8 May 2014

EU-T0, Data Research and Innovation Hub EU-T0 is a hub of knowledge and expertise that optimises the investment of the funding agencies in proven e-infrastructure by broadening, simplifying, and harmonising access, driven by well-defined user requirements. EU-T0 aims to build a federated virtual European Tier 0 data-management and computing centre, implementing the connection and coordination among the major national e-infrastructures. EU-T0 contributes to the implementation of the vision towards a general purpose European computing and data e-infrastructure for research [more agencies are joining] 2

Visions The e-infrastructure Reflection Group (e-irg) 2013 White Paper addresses the need of new paradigm for e-infrastructure for research, able to account for users needs and clearly define roles for all stakeholders. CERN and the EIROforum members have published a vision for the evolution of the European e-infrastructure aiming to create a sustainable IT environment open to all science communities. The vision capitalizes on the investments in computing infrastructure made over the last decade, and the facilities in place to support the major European Research Infrastructures (RIs). The next generation of services for the distributed computing and storage infrastructures have to address the current limitations and profit at best of the important advancement in Cloud Computing and in the CPU architectures. The next goal is to make available to Europe the next state of the art distributed infrastructure for Big Data sharing and analysis. 3

Rationale Considerable overlap between the research funding agencies in Particle, Nuclear, Astroparticle Physics, Cosmology and Astrophysics. APPEC promotes opportunities to cooperate in designing new software methods, computing and data processing infrastructures for the current and future major research projects. Promoting a large cluster of major research projects and key representatives around e- Science and data management issues. The EU-T0 federation was launched officially on the 11 th February 2014 by some major research institutes in the above mentioned domains: CERN, CIEMAT-ES, DESY-GE, IFAE- ES, IN2P3-FR, INFN-IT, KIT-GE and STFC-UK and the participation is going to be extended to more institutes. The approach aims to be a core project for a larger multi-domain federation. 4

Implementation Bringing the research communities closer each other to support their needs and: avoid fragmentation and repetitions; increase cross-fertilisation; share standards, expertise and developments; provide and share services; promote outstanding CCs in Europe. According to a data and researchers centric approach, the EU-T0 e-infrastructure accounts for the user needs and the expectations of research communities committed in major ESFRI RIs and ERF. Excellence in SCIENCE Research Community = Research Agencies Project CTA Project i Project i+1 Computing Centres Project LHC DATA Excellence in DATA MANAGEMENT Excellence in SOFTWARE Excellence in SERVICES Excellence in COMPUTING LONG-TAIL e- initiatives: EGI, EUDAT, etc. Multidisciplinary support Inspiring and feeding the federated CCs as a Tier-0 Center of Excellence and bringing into operation a global European e-infrastructure for Research

Implementation E-INFRASTRUCTURES A few pillar projects just started: The EU-T0 data backbone: heterogeneous storage managed in a federated way; interoperable (EUDAT-compatible); fulfilling the real-time ingestion and the archive access requirements; The EU-T0 cloud: extremely hybrid distributed computing architecture, scheduling, virtualizing and configurable for all users; serving also long-tail science. The EU-T0 VRE and software: organization, repositories and provision of services, preserving data, and new software programming test-bed provision across communities; The EU-T0 training: data scientist building profile and promoting careers. The EU-T0 pilot: new business model and interaction with private sector for codevelopments around big-data services and cloud (Helixnebula). 6

Implementation EU-T0 IN THE ECOSYSTEM There is an important need for a number of coordination activities which embraces globally all e-infras. : a) Organization and repositories for sharing services, software, data across communities; b) Recognize shared/common needs for coordinated developments and services; c) Efforts to develop new services/tools where needs are identified; d) Policy development (shown to be essential for successful infrastructures); e) Security coordination policy development and incident handling; f) Collaboration beyond Europe and with more scientific communities; g) Integration with other forms of e-infrastructures including HPC and volunteer computing; h) Coordination/negotiation with industry; i) Creation of a training network to develop the key skills needed for the future including creating data scientists. 7

EU-T0: e-infrastructures background The EU-T0 partners are the research agencies owning large computing centers (CCIN2P3, CNAF, GridPP, PIC, DESY-Tier2, KIT-Tier1, + many Tier2 national CCs) part of the Worldwide LHC Computing Grid (WLCG) project, having successfully implemented a distributed computing infrastructure but also Supporting large Research Infrastructures (RIs) (some in the ESFRI roadmap) in Astroparticle Physics and Cosmology, such as AMS, AUGER, H.E.S.S., MAGIC, CTA, FERMI, KM3Net, SKA, VIRGO/EGO and future gravitational waves projects, PLANCK, EUCLID, LSST, and in photon science XFEL, etc.. The EU-T0 hub is built up on gathered resources currently of the order of hundreds of thousands of processing cores, targeting the half a million cores of computing resources in the next few years. EU-T0 archives big, heterogeneous and complex data through storage resources of the order of some hundreds Petabytes, which will grow up at the Exabyte scale already in the next years. 8

Particle Physics High capacity network is changing the participation of CCs and is impacting the LHC experiments computing models. The LHC experiments data throughput is already doubling in 2015 and the resources needs will increase further in the next years (millions of CPUs and hundreds PBs storage.) In order to exploit technological improvements we need to evolve the code and framework at least to: - fully use many-cores devices - move big data efficiently in the cloud paradigm - solve the high speed disk access The data sharing : preservation and long-term open access are new issues in the domain (see DPHEP talk)

Astroparticle Physics New RIs (CTA, EUCLID, LSST, SKA, GWs...) are data intensive and have to deal with Big Data management issues: - pipelines from remote sensors at tens of GB/s rates; - on-site quick pre-processing and scientific alerts management; - huge archive and important data discovery requirements (hundreds PB/year); and Open Access issues: - high-level data products dissemination, curation and mining addressed through open access software, tools and services and the coherent framework of the Virtual Observatory (see BD in Astronomy talk). 10

Astroparticle Physics The Observatory functioning-mode (including limited data proprietary periods) and the multi-wavelength scientific analyses required by researchers implies important common developments: - Advanced, high speed, high capacity international networks among CCs archiving and processing data. - A functional Authorisation, Authentication and Accounting (AAA) infrastructure. - Research projects requiring access to all of the data at once will also require significant computational resources to achieve their goals -> cloud services and new workflow management on distributed compute-intensive systems. - Efficient Petascale DB queries for data analysis are critical-> A common work on the design and testing of extremely large database 11

Research Infrastructures A few examples of projects and issues: CTA: - The pipelines will rely on a few CCs archiving O(100) PB of raw-data and need to build an efficient metadata model for data extraction and user access. - A worldwide community dealing with proprietary data (privileged users) and observatory products (guest observers). AAI is needed. EUCLID - Implementing a computing model where data archives and data processing are distributed among national CCs producing ~500 PB data from hundreds of TB raw and with a central metadata system. LSST - 15 TB/night during 10 years; final image collection : 0.5 Exabytes; ~10 million alerts on transient event per night; catalog of ~37 billions objects. - Extremely Large Databases, building Petascale systems (i.e. qserv) which is a highly scalable distributed database system to handle astrophysics queries. - Design a coherent approach to run cross queries between them. Gravitational Waves - Willing to create an international network of CCs for sharing data from different antennas and combined observations. - Long term preservation policies for important software tools, to read, manage and analyze the data (used for important discovery). [ ] 12

Conclusions EU-T0 is a new path towards a hub of knowledge and expertise that optimises the investment of the funding agencies in proven e-infrastructures and driven by welldefined user requirements (including those concerning Big Data and Open Access ). Resources are a combination of hardware, software and skills. Data management and data challenges are part of our ESFRI RIs and ERFs. The excellence in science is based on excellent data produced by excellent facilities and managed by excellent e-infrastructures. They deserve support in Europe for frontier developments of potential transversal application. ++ 13