Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Similar documents
Data Handling and Transfer for German LOFAR Stations

THE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE

Grid Computing: dealing with GB/s dataflows

Grid Computing: dealing with GB/s dataflows

Data federations and its applications

Information systems playground the target infrastructure

SARA Computing & Networking Services

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

Accelerating Throughput from the LHC to the World

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Data Centres in the Virtual Observatory Age

De BiG Grid e-infrastructuur digitaal onderzoek verbonden

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

University of Groningen

High-Energy Physics Data-Storage Challenges

SARA Overview. Walter Lioen Group Leader Supercomputing & Senior Consultant

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Figure 1: cstcdie Grid Site architecture

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

PoS(11th EVN Symposium)112

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

European VLBI Network

5 Fundamental Strategies for Building a Data-centered Data Center

SURFsara Data Services

Storage Optimization with Oracle Database 11g

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Distributed e-infrastructures for data intensive science

Adaptive selfcalibration for Allen Telescope Array imaging

XenData MX64 Edition. Product Brief:

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

Distributed File Systems Part IV. Hierarchical Mass Storage Systems

Milestone Solution Partner IT Infrastructure Components Certification Report

HEP replica management

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

ASKAP Central Processor: Design and Implementa8on

IBM ProtecTIER and Netbackup OpenStorage (OST)

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Austrian Federated WLCG Tier-2

Summary of the LHC Computing Review

VIRTUAL OBSERVATORY TECHNOLOGIES

WSDC Hardware Architecture

Connecting the e-infrastructure chain

1. Introduction. Outline

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

IBM Storwize V7000 Unified

The High Performance Archiver for the LHC Experiments

Xcellis Technical Overview: A deep dive into the latest hardware designed for StorNext 5

Improved Solutions for I/O Provisioning and Application Acceleration

Data oriented job submission scheme for the PHENIX user analysis in CCJ

HPC Storage Use Cases & Future Trends

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Coordinating Parallel HSM in Object-based Cluster Filesystems

CouchDB-based system for data management in a Grid environment Implementation and Experience

Backup and archiving need not to create headaches new pain relievers are around

Overview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin

Storage Resource Sharing with CASTOR.

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data

Storage for HPC, HPDA and Machine Learning (ML)

The CMS Computing Model

The LHC Computing Grid

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Benoit DELAUNAY Benoit DELAUNAY 1

Virtualizing a Batch. University Grid Center

Top Trends in DBMS & DW

EMC VMAX 400K SPC-2 Proven Performance. Silverton Consulting, Inc. StorInt Briefing

I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

Milestone Solution Partner IT Infrastructure Components Certification Report

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Zero Data Loss Recovery Appliance DOAG Konferenz 2014, Nürnberg

SAT AI (AutoIngest) SAT AI an efficient field use workflow:

First Experience with LCG. Board of Sponsors 3 rd April 2009

Increasing Performance of Existing Oracle RAC up to 10X

Data Movement & Tiering with DMF 7

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Integrating Fibre Channel Storage Devices into the NCAR MSS

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5)

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Simplifying Collaboration in the Cloud

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Shared Storage Solutions. for Collaborative Media Production Networks

LHCb Computing Strategy

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Oracle Database 12C: Advanced Administration - 1Z0-063

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Minutes of Meeting LOFAR Software

WHITE PAPER STORNEXT 4K REFERENCE ARCHITECTURES. Optimized Storage Solutions Based on Comprehensive Performance Testing

NetVault Backup Client and Server Sizing Guide 3.0

Table 9. ASCI Data Storage Requirements

Building a Multi-protocol, analytics-enabled, Data Lake with Isilon

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Copyright 2012 EMC Corporation. All rights reserved.

Transcription:

Netherlands Institute for Radio Astronomy Update LOFAR Long Term Archive May 18th, 2009 Hanno Holties

LOFAR Long Term Archive (LTA) Update Status Architecture Data Management Integration LOFAR, Target, EGEE Data Challenges Computational Facilities Planning Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 2

The LOFAR CS1 archive Archived data (~20TB) stored on Dutch GRID facilities For access GRID certificate required Most data migrated to tape (Very) high latency Staging required for efficient downloading Some data available from the Astro-Wise prototype Visibility data only Incomplete: not yet automatically updated 10 Gbps connection CIT SARA operational Used with 10 file servers in data challenges Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 3

LOFAR LTA Target/AstroWise Prototype Collaboration with ASTRO-WISE (courtesy W.-J. Friend and J. McFarland) Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 4

The LOFAR LTA Characteristics Large (Estimated yearly growth: 2.5 Petabyte) Nevertheless storage (& computing) scarce resource Allocated by Program Committee Mix of technologies Tape: cheap(er) & slow(er) Disk: expensive & fast(er) Distributed Groningen (CIT) Amsterdam (SARA) Jülich Integrated processing facilities Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 5

The LOFAR LTA Activities LOFAR/BiG Grid (ASTRON, CIT, SARA, Nikhef) Blue Print defined Working on detailed design Acquisition systems later this year Plan to provide 200 300 TB storage this year Target (CIT, RUG, ASTRON) Budget awarded Working on detailed design Plan to provide central database & File Server this year GLOW (Jülich) Being discussed Plan to provide 1 PB storage Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 8

The LOFAR LTA The Central Processing facility 50 Gbps 100 Gbps 20 Gbps BG/P Online Processing Temp. Storage Offline Processing Switch Storage Processing Cluster 42 Tflops Final configuration (2010) 2 PB 10+ Tflops Output online processing ~50 gbps (0.5 PB per day!) Expect typically 1-2 weeks retention time Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 11

The LOFAR LTA Architecture international stations International LOFAR Data Center NL stations Temporary Storage & Processing LOFAR Data centers (General) Operations Monitoring & Control LTA Metadata & File catalog Science & VO Portal LOFAR Observatory Science Support General Storage & Processing Specific Project Storage & Processing Any LOFAR User Specific LOFAR (Key) Project User Project Portal LOFAR Data Center (incl. Specific Project) Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 12

LOFAR LTA Tier 1 Integration LTA/Tier 1 Site Integration LOFAR LTA/Tier 1 File Catalog integration LOFAR LTA/Tier 1 File system integration Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 14

The LOFAR LTA Data management Data retrieved from archive (not CEP) Data owned by LOFAR Proprietary period may apply Most data will become publicly available Security applied throughout the LTA Uses LOFAR user accounts Use of personal (GSI) certificates is option Authorization based on project administration Resource allocation & usage accounted Central file catalog Keeps track of location(s) of data stored in the LTA Provides references to physical file locations for retrieval Derived data products can be added to the archive (by user) Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 19

LOFAR LTA Data streams Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 21

LOFAR LTA Data challenges 1) Available effective bandwidth CEP Tier 1 Memory Memory; aim: > 9 Gbps 2) Basic file transfer Disk Disk; aim: > 9 Gbps 3) Stress test file transfers to online systems 24 hour; Disk Disk; aim: 2 Gbps sustained 4) Basic beginning to end file transfer Disk Tape; aim : 2 Gbps sustained 5) Operational use case 24 hour; Disk Tape; aim: > 30 TB transferred Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 22

LOFAR LTA Data challenges CEP - SARA NB 10 available old servers using spare Gbps ports Results: 1) Memory Memory: 8.4 Gbps Probable bottle neck network cards used at CEP 2) Disk Disk: 4.8 Gbps Bottleneck File servers at SARA side 3) Stress test Disk Disk: 2.5 Gbps (not optimized) > 5Gbps in burst-mode 4) Disk Tape: 0.8 Gbps Limited by available SARA tape systems 5) On hold Waiting for new CEP systems & SARA LHC tape units Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 23

LOFAR LTA Data challenges CEP - SARA 3) 24 hour Disk Disk 4 SARA file servers also provide shared disk storage Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 24

LOFAR LTA Data challenges CEP - SARA 4) Disk Tape 3 Stage process: i) Storage on disk based file servers ii) Automatic migration to tape staging area (disk) iii) Migration to tape 2x 60 MBps unit Room for improvement! Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 25

LOFAR LTA Processing in the LTA Still many uncertainties (what is exactly needed) Some typical (?) processing steps have been considered: Calibration: BBS estimates (mostly embarrassingly parallel) Projection: 5 TB visibility dataset (can be parallelized) FFT: 10,000 10,000 Preliminary conclusions: Likely to fit on standard 8 core (dual quad core) nodes with 16 or 32 Gbytes of RAM Bandwidth limited: requires fast network to keep compute nodes busy MPI & SMP needed to use cores efficiently with 10GB+ datasets Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 26

LOFAR LTA Planning Connection CEP Tier 1 SARA (update), TARGET/CIT, Jülich Data challenges Security implementation Central LOFAR Identity Management Provisioning through proposal application Integration with Tier 1 facilities Ingest procedure Bulk data transfers to distributed Tier 1 sites Metadata to central LTA database Storage table (LTA central file catalog) being designed Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 28

LOFAR LTA Conclusion First implementation ready Q3 2009 Data challenges progressing Preliminary results promising Restrictions understood In particular processing requirements uncertain Need substantiation through performing realistic tests! BiG Grid facilities are available Storage Processing ( ) Target prototype is available Jülich offers to provide a Petabyte Hanno Holties, LOFAR Technical Status Meeting 18/5/2009 29