Data Staging: Moving large amounts of data around, and moving it close to compute resources

Size: px
Start display at page:

Download "Data Staging: Moving large amounts of data around, and moving it close to compute resources"

Transcription

1 Data Staging: Moving large amounts of data around, and moving it close to compute resources Digital Preserva-on Advanced Prac--oner Course Glasgow, July 19 th 2013

2 Definition Starting point Moving data around Size/Performances The user Data selection Transfer options Numbers Registered data Outline 2

3 Data movement: staging or replication? PID Safe Replica/on Data preservation and access optimization Data Staging Dynamic replication to HPC workspace for processing Data 3 3

4 Staging 4 4

5 Moving large amounts of data around Data sets can be large both in terms of Numbers of objects Single object size 5

6 All data are gray in the data staging The data types are irrelevant Except when they can affect the size or the number of the files hap:// 6

7 When I should care about data types? When you can compress them Data transfer efficiency is crucial in data staging 7

8 Time to copy 1TB 10 Mb/s network: 300 hrs (12.5 days) 100 Mb/s network: 30 hrs 1 Gb/s network: 3 hrs (are your disks fast enough?) 10 Gb/s network: 20 minutes (need really fast disks and file system) Compare these speeds to: USB 2.0 portable disk 60 MB/sec (480 Mbps) peak 20 MB/sec (160 Mbps) reported on line hours to load 1 Terabyte 8 Parallel I/O and management of large scien-fic data

9 Data Throughput Transfer Times This table available at 9 Parallel I/O and management of large scien-fic data

10 Data staging is Just in Time You do not plan it You can pre-stage sometimes There are techniques to improve the efficiency 10

11 Faster than light TCP tuning: refers to the proper configuration of buffers that correspond to TCP windowing Pipelining (of commands): speeds up lots of tiny files by stuffing multiple commands into each login session back-to-back without waiting for the first command's response 11

12 Parallelizing: on wide-area links, using multiple TCP streams in parallel (even between the same source and destination) can improve aggregate bandwidth over using a single TCP stream 12

13 Striping: data may be striped or interleaved across multiple servers 13

14 Parallelism and TCP tuning are the keys It is much easier to achieve a given performance level with four parallel connections than with one A good TCP tuning can improve drastically performances Latency interaction is critical Wide area data transfers have much higher latency than LAN transfers Many tools and protocols assume a LAN Examples: SCP/SFTP, HPSS mover protocol 14

15 Use the right tool SCP/SFTP: 10 Mb/s FTP: Mb/s assumes TCP buffer autotuning standard Unix file copy tools Parallel stream FTP: Mbps fixed 1 MB TCP window in OpenSSH only 64 KB in OpenSSH versions <

16 But efficiency is not all Easiness of use High reliability Third-party transfer Possibility to control/limit the transfer throughput to avoid engulfing the network Which priority? Different scenarios, different needs 16

17 and moving it close to compute resources 17

18 Who will move the data? Any user, part of a community or citizen scientist. In most part of the cases, a domain expert, such as a biologist, a linguist, a seismologist, not an expert of data transfer software. 18

19 Wizard gap As Ethernet speeds have increased, there has been a widening gap between ability of the novice to fully use bandwidth capabilities, compared to that of network capabilites achieved by users who have had their network tuned by an expert ('Wizard"). [Diagram c/o MaA Mathis, hap:// 19

20 Data selection Data Staging Overview 20

21 How the data sets are selected by the domain experts? Through their local tools, where local means on the community side Through remote services, generic or specific per community 21

22 Move the data 22

23 Eudat Transfer options Globus Online Third party transfer, web portal Globus client server to server to server irods specific client server to to server server HTTP options Work in progress 23

24 GridFTP: third Party Transfer Control channels GridFTP Server Data channels GridFTP Server Site A Site A 24

25 Globus OnLine Service 25

26 EUDAT PID Howto transfer #TB or PB From to HPC which site EUDAT X site Which A,B tool or C to use: GridFTP, Howto Scp, to get irods? access all sites? EUDAT site B GridFTP scp 1Gb/s WAN HPC site X EUDAT 10Gb/s WAN Globus Online irods 10Gb/s OPN EUDAT PID Bittorrent? EUDAT PID EUDAT site C EUDAT site A 26

27 Stage-out Data transport protocols: GridFTP, HTTP. SCP, FTP, RSYNC, OpenDap? WAN Which tool to manage the transfers: irods, Globus Online, FTS, RSYNC, Others.? PID EUDA T LTA EUDAT SP site A WAN Third party transfers Dedicated Network Third party transfers HPC site X HPC space EUDAT User Forum Barcelona, 7-8 March

28 Stage-out/stage-in Workflow framework Stage 1 Stage 2 Stage # PID EUDAT LTA EUDAT SP site A Data Mining cluster EUDAT User Forum Barcelona, 7-8 March

29 Multi-step staging EPCC EUDAT center Community center VPH ENES CLARIN Lifewatch CINECA BSC EPOS 29

30 Local staging targets 1. Single step 30

31 Sample Data Transfer Results Sample Results: RTT = 53 ms, network capacity = 10Gb/s. Tool Throughput scp: 140 Mb/s HPN patched scp: 1.2 Gb/s FTP: 1.4 Gb/s GridFTP, 4 streams 5.4 Gb/s GridFTP, 8 streams 6.6 Gb/s 31

32 irods irods is a data management system, which integrates a rule engine The rules can be triggered automatically based on specific events (for example a data set is moved to a particular directory) Invoked from remote via irods command line client or integrated into applications based on irods java (Jargon) and python (PyRods) API Data Staging Overview 32

33 Metadata! "Managing Large Datasets with irods a Performance Analysis" - Proceedings of the Interna-onal Mul-conference on Computer Science and Informa-on Technology pp

34 haps://openwiki.uninea.no/norstore:irods:workshop

35 Data Staging and registered data PID Data Center Store PID Data Center Store Data Center Store HPC ain of registered data 35

36 Conclusions The way to move data has to be enough flexible to accomodate different transfer protocols, different access mechanisms. Flexibility means also that the transfer tools can be used as they are, with default parameters, for average performances, but also fine tuned by experts for faster transfers. No solution fits all, so different services are provided 36

37 e- IRG Workshop - Dublin - Ireland 37 37

38 Thank you! 38

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Data Staging: Moving large amounts of data around, and moving it close to compute resources Data Staging: Moving large amounts of data around, and moving it close to compute resources PRACE advanced training course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari

More information

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Data Replication: Automated move and copy of data PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari c.cacciari@cineca.it Outline The issue

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni, Claudio Cacciari SuperComputing Centre, CINECA, Italy Peter Wittenburg, Daan Broeder The Language Archive, Max Planck Institute,

More information

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012 EUDAT Towards a Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012 Big (Chaotic) Data DATA GENERATORS 1) Measurement technology. 2) Cheap

More information

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members Agenda Background information Services Common Data Infrastructure

More information

Using EUDAT services to replicate, store, share, and find cultural heritage data

Using EUDAT services to replicate, store, share, and find cultural heritage data Using EUDAT services to replicate, store, share, and find cultural heritage data in PSNC and beyond Maciej Brzeźniak, Norbert Meyer, PSNC Damien Lecarpentier, CSC 3 October 203 DIGITAL PRESERVATION OF

More information

File Transfer: Basics and Best Practices. Joon Kim. Ph.D. PICSciE. Research Computing 09/07/2018

File Transfer: Basics and Best Practices. Joon Kim. Ph.D. PICSciE. Research Computing 09/07/2018 File Transfer: Basics and Best Practices Joon Kim. Ph.D. PICSciE Research Computing Workshop @Chemistry 09/07/2018 Our goal today Learn about data transfer basics Pick the right tool for your job Know

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics EUDAT & AAI Daan Broeder MPI for Psycholinguistics Initially six research communities on Board EPOS: European Plate Observatory System CLARIN: Common Language Resources and Technology Infrastructure ENES:

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Martin Hellmich Slides adapted from Damien Lecarpentier DCH-RP workshop, Manchester, 10 April 2013 Research Infrastructures Research Infrastructure

More information

EUDAT Common data infrastructure

EUDAT Common data infrastructure EUDAT Common data infrastructure Giuseppe Fiameni SuperComputing Applications and Innovation CINECA Italy Peter Wittenburg Max Planck Institute for Psycholinguistics Nijmegen, Netherlands some major characteristics

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

EUDAT Towards a Collaborative Data Infrastructure

EUDAT Towards a Collaborative Data Infrastructure EUDAT Towards a Collaborative Data Infrastructure Daan Broeder - MPI for Psycholinguistics - EUDAT - CLARIN - DASISH Bielefeld 10 th International Conference Data These days it is so very easy to create

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona Date: 7 March 2012 EUDAT Key facts Content Project Name

More information

ProMAX Cache-A Overview

ProMAX Cache-A Overview ProMAX Cache-A Overview Cache-A Archive Appliances With the proliferation of tapeless workflows, came an amazing amount of data that not only needs to be backed up, but also moved and managed. This data

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni (g.fiameni@cineca.it) Claudio Cacciari SuperComputing, Application and Innovation CINECA Johannes Reatz RZG, Germany Damien

More information

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY Data Staging and Data Movement with EUDAT Course Introduction Helsinki 10 th -12 th September, 2013 1400 Introduction Adam Carter Course Timetable TODAY 1430 The EUDAT Safe Replication Service Claudio

More information

Ultra high-speed transmission technology for wide area data movement

Ultra high-speed transmission technology for wide area data movement Ultra high-speed transmission technology for wide area data movement Michelle Munson, president & co-founder Aspera Outline Business motivation Moving ever larger file sets over commodity IP networks (public,

More information

I data set della ricerca ed il progetto EUDAT

I data set della ricerca ed il progetto EUDAT I data set della ricerca ed il progetto EUDAT Casalecchio di Reno (BO) Via Magnanelli 6/3, 40033 Casalecchio di Reno 051 6171411 www.cineca.it 1 Digital as a Global Priority 2 Focus on research data Square

More information

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis,

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? - EUDAT Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? - Damien Lecarpentier CSC-IT Center for Science, Finland NeIC Conference Trondheim, 16 May 2013 Data trends Exponential

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland CESSDA workshop Tampere, 5 October 2012 EUDAT Towards a pan-european Collaborative

More information

Automatically Encapsulating HPC Best Practices Into Data Transfers

Automatically Encapsulating HPC Best Practices Into Data Transfers National Aeronautics and Space Administration Automatically Encapsulating HPC Best Practices Into Data Transfers Paul Z. Kolano NASA Advanced Supercomputing Division paul.kolano@nasa.gov www.nasa.gov Introduction

More information

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

USE CASES IN SEISMOLOGY. Alberto Michelini INGV USE CASES IN SEISMOLOGY Alberto Michelini INGV PROJECTS NERA (Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation) VERCE (Virtual Earthquake and seismology Research

More information

Table 9. ASCI Data Storage Requirements

Table 9. ASCI Data Storage Requirements Table 9. ASCI Data Storage Requirements 1998 1999 2000 2001 2002 2003 2004 ASCI memory (TB) Storage Growth / Year (PB) Total Storage Capacity (PB) Single File Xfr Rate (GB/sec).44 4 1.5 4.5 8.9 15. 8 28

More information

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya

More information

Data Discovery - Introduction

Data Discovery - Introduction Data Discovery - Introduction Why (benefits of reusing data) How EUDAT's services help with this (in general) Adam Carter In days gone by: Design an experiment Getting Your Data Conduct the experiment

More information

XSEDE Software and Services Table For Service Providers and Campus Bridging

XSEDE Software and Services Table For Service Providers and Campus Bridging XSEDE Software and Services Table For Service Providers and Campus Bridging 19 February 2013 Version 1.1 Page i Table of Contents A. Document History iv B. Document Scope v C. 1 Page ii List of Figures

More information

Outline. ASP 2012 Grid School

Outline. ASP 2012 Grid School Distributed Storage Rob Quick Indiana University Slides courtesy of Derek Weitzel University of Nebraska Lincoln Outline Storage Patterns in Grid Applications Storage

More information

The Internet. Session 3 INST 301 Introduction to Information Science

The Internet. Session 3 INST 301 Introduction to Information Science The Internet Session 3 INST 301 Introduction to Information Science Outline The creation story What it is Exploring it Using it Outline The creation story What it is Exploring it Using it Source: Wikipedia

More information

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education

More information

QoS on Low Bandwidth High Delay Links. Prakash Shende Planning & Engg. Team Data Network Reliance Infocomm

QoS on Low Bandwidth High Delay Links. Prakash Shende Planning & Engg. Team Data Network Reliance Infocomm QoS on Low Bandwidth High Delay Links Prakash Shende Planning & Engg. Team Data Network Reliance Infocomm Agenda QoS Some Basics What are the characteristics of High Delay Low Bandwidth link What factors

More information

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning Abstract This white paper describes how to configure the Celerra IP storage system

More information

Data transfer at CINECA: how to enjoy it!

Data transfer at CINECA: how to enjoy it! Data transfer at CINECA: how to enjoy it! Giacomo Mariani g.mariani@cineca.it Bologna 03/05/2011 www.cineca.it Table of contents Resources What we have Use Case What users need Tools How we support them

More information

Isilon Performance. Name

Isilon Performance. Name 1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.

More information

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team B2Safe @ ODC and future EIDA/ EPOS-S plans within EUDAT2020 Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team 3rd Conference, Amsterdam, The Netherlands, 24-25 September 2014

More information

Optimizing reasonably secure Long Distance Data Transfer. How to transfer Data while being poor

Optimizing reasonably secure Long Distance Data Transfer. How to transfer Data while being poor Manfred Stolle Zuse Institute Berlin Optimizing reasonably secure Long Distance Data Transfer Or How to transfer Data while being poor 1 The situation A scientific team gains on a daily basis 250 GB of

More information

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication

More information

A data Grid testbed environment in Gigabit WAN with HPSS

A data Grid testbed environment in Gigabit WAN with HPSS Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California A data Grid testbed environment in Gigabit with HPSS Atsushi Manabe, Setsuya Kawabata, Youhei Morita, Takashi Sasaki,

More information

Extreme I/O Scaling with HDF5

Extreme I/O Scaling with HDF5 Extreme I/O Scaling with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group koziol@hdfgroup.org July 15, 2012 XSEDE 12 - Extreme Scaling Workshop 1 Outline Brief overview of

More information

New User Seminar: Part 2 (best practices)

New User Seminar: Part 2 (best practices) New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency

More information

Overcoming Obstacles to Petabyte Archives

Overcoming Obstacles to Petabyte Archives Overcoming Obstacles to Petabyte Archives Mike Holland Grau Data Storage, Inc. 609 S. Taylor Ave., Unit E, Louisville CO 80027-3091 Phone: +1-303-664-0060 FAX: +1-303-664-1680 E-mail: Mike@GrauData.com

More information

Server Specifications

Server Specifications Requirements Server s It is highly recommended that MS Exchange does not run on the same server as Practice Evolve. Server Minimum Minimum spec. is influenced by choice of operating system and by number

More information

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts

More information

Enterprise print management in VMware Horizon

Enterprise print management in VMware Horizon Enterprise print management in VMware Horizon Introduction: Embracing and Extending VMware Horizon Tricerat Simplify Printing enhances the capabilities of VMware Horizon environments by enabling reliable

More information

Zhengyang Liu University of Virginia. Oct 29, 2012

Zhengyang Liu University of Virginia. Oct 29, 2012 SDCI Net: Collaborative Research: An integrated study of datacenter networking and 100 GigE wide-area networking in support of distributed scientific computing Zhengyang Liu University of Virginia Oct

More information

Layer 4 TCP Performance in carrier transport networks Throughput is not always equal to bandwidth

Layer 4 TCP Performance in carrier transport networks Throughput is not always equal to bandwidth Layer 4 TCP Performance in carrier transport networks Throughput is not always equal to bandwidth Roland Stooss JDSU Deutschland GmbH Senior Consultant Data/IP Analysis Solutions Mühleweg 5, D-72800 Eningen

More information

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY Preston Smith Director of Research Services RESEARCH DATA DEPOT AT PURDUE UNIVERSITY May 18, 2016 HTCONDOR WEEK 2016 Ran into Miron at a workshop recently.. Talked about data and the challenges of providing

More information

Zhengyang Liu! Oct 25, Supported by NSF Grant OCI

Zhengyang Liu! Oct 25, Supported by NSF Grant OCI SDCI Net: Collaborative Research: An integrated study of datacenter networking and 100 GigE wide-area networking in support of distributed scientific computing Zhengyang Liu! Oct 25, 2013 Supported by

More information

Data Transfers in the Grid: Workload Analysis of Globus GridFTP

Data Transfers in the Grid: Workload Analysis of Globus GridFTP Data Transfers in the Grid: Workload Analysis of Globus GridFTP Nicolas Kourtellis, Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi University of South Florida Dan Fraser Argonne National Laboratory Objective

More information

EUDAT - Open Data Services for Research

EUDAT - Open Data Services for Research EUDAT - Open Data Services for Research Johannes Reetz EUDAT operations Max Planck Computing & Data Centre Science Operations Workshop 2015 ESO, Garching 24-27th November 2015 EUDAT receives funding from

More information

XSEDE Software and Services Table For Service Providers and Campus Bridging

XSEDE Software and Services Table For Service Providers and Campus Bridging XSEDE Software and Services Table For Service Providers and Campus Bridging 24 September 2015 Version 1.4 Page i Table of Contents A. Document History iv B. Document Scope v C. 1 Page ii List of Figures

More information

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The

More information

Internet Content Distribution

Internet Content Distribution Internet Content Distribution Chapter 1: Introduction Jussi Kangasharju Chapter Outline Introduction into content distribution Basic concepts TCP DNS HTTP Outline of the rest of the course Kangasharju:

More information

Grid and Component Technologies in Physics Applications

Grid and Component Technologies in Physics Applications Grid and Component Technologies in Physics Applications Svetlana Shasharina Nanbor Wang, Stefan Muszala and Roopa Pundaleeka. sveta@txcorp.com Tech-X Corporation Boulder, CO ICALEPCS07, October 15, 2007

More information

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library Authors Devresse Adrien (CERN) Fabrizio Furano (CERN) Typical HPC architecture Computing Cluster

More information

SSH Bulk Transfer Performance. Allan Jude --

SSH Bulk Transfer Performance. Allan Jude -- SSH Bulk Transfer Performance Allan Jude -- allanjude@freebsd.org Introduction 15 Years as FreeBSD Server Admin FreeBSD src/doc committer (ZFS, bhyve, ucl, xo) FreeBSD Core Team (July 2016-2018) Co-Author

More information

Enabling High Performance Bulk Data Transfers With SSH

Enabling High Performance Bulk Data Transfers With SSH Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin Bennett TIP 08 Moving Data Still crazy after all these years Multiple solutions exist Protocols UDT, SABUL, etc Implementations

More information

MX Sizing Guide. 4Gon Tel: +44 (0) Fax: +44 (0)

MX Sizing Guide. 4Gon   Tel: +44 (0) Fax: +44 (0) MX Sizing Guide FEBRUARY 2015 This technical document provides guidelines for choosing the right Cisco Meraki security appliance based on real-world deployments, industry standard benchmarks and in-depth

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd EUDAT Data Services & Tools for Researchers and Communities Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd CSC IT CENTER FOR SCIENCE! Founded in 1971 as a technical support

More information

An Overview of Fujitsu s Lustre Based File System

An Overview of Fujitsu s Lustre Based File System An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu

More information

Building a Materials Data Facility (MDF)

Building a Materials Data Facility (MDF) Building a Materials Data Facility (MDF) Ben Blaiszik (blaiszik@uchicago.edu) Ian Foster (foster@uchicago.edu) Steve Tuecke, Kyle Chard, Rachana Ananthakrishnan, Jim Pruyne (UC) Kandace Turner-Jones, John

More information

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

CS60021: Scalable Data Mining. Sourangshu Bhattacharya CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer

More information

LCG data management at IN2P3 CC FTS SRM dcache HPSS

LCG data management at IN2P3 CC FTS SRM dcache HPSS jeudi 26 avril 2007 LCG data management at IN2P3 CC FTS SRM dcache HPSS Jonathan Schaeffer / Lionel Schwarz dcachemaster@cc.in2p3.fr dcache Joint development by FNAL and DESY Cache disk manager with unique

More information

GRID AND COMPONENT TECHNOLOGIES IN PHYSICS APPLICATIONS

GRID AND COMPONENT TECHNOLOGIES IN PHYSICS APPLICATIONS GRID AND COMPONENT TECHNOLOGIES IN PHYSICS APPLICATIONS S. Shasharina, N.Wang, S. Muszala, R. Pundaleeka, Tech-X Corporation, Boulder, CO 80303, USA Abstract Recent advances in computer science have made

More information

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research Storage Platforms with Aspera Overview A growing number of organizations with data-intensive

More information

Using SDSC Systems (part 2)

Using SDSC Systems (part 2) Using SDSC Systems (part 2) Running vsmp jobs, Data Transfer, I/O SDSC Summer Institute August 6-10 2012 Mahidhar Tatineni San Diego Supercomputer Center " 1 vsmp Runtime Guidelines: Overview" Identify

More information

Approaches for Upgrading to SAS 9.2. CHAPTER 1 Overview of Migrating Content to SAS 9.2

Approaches for Upgrading to SAS 9.2. CHAPTER 1 Overview of Migrating Content to SAS 9.2 1 CHAPTER 1 Overview of Migrating Content to SAS 9.2 Approaches for Upgrading to SAS 9.2 1 What is Promotion? 2 Promotion Tools 2 What Can Be Promoted? 2 Special Considerations for Promoting Metadata From

More information

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer A Crash Course In Wide Area Data Replication Jacob Farmer, CTO, Cambridge Computer SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals

More information

Parallel Storage Systems for Large-Scale Machines

Parallel Storage Systems for Large-Scale Machines Parallel Storage Systems for Large-Scale Machines Doctoral Showcase Christos FILIPPIDIS (cfjs@outlook.com) Department of Informatics and Telecommunications, National and Kapodistrian University of Athens

More information

Protect enterprise data, achieve long-term data retention

Protect enterprise data, achieve long-term data retention Technical white paper Protect enterprise data, achieve long-term data retention HP StoreOnce Catalyst and Symantec NetBackup OpenStorage Table of contents Introduction 2 Technology overview 3 HP StoreOnce

More information

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov

More information

Student ID: CS457: Computer Networking Date: 3/20/2007 Name:

Student ID: CS457: Computer Networking Date: 3/20/2007 Name: CS457: Computer Networking Date: 3/20/2007 Name: Instructions: 1. Be sure that you have 9 questions 2. Be sure your answers are legible. 3. Write your Student ID at the top of every page 4. This is a closed

More information

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to

More information

CSE 124: Networked Services Lecture-16

CSE 124: Networked Services Lecture-16 Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007 Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software

More information

201 9 Iin n n n o o v v a a t tiin n t t S v Y e S rstie o M n 3 R S E ysq t U e IR m E R M eq E un ir T e S ments Da t e :

201 9 Iin n n n o o v v a a t tiin n t t S v Y e S rstie o M n 3 R S E ysq t U e IR m E R M eq E un ir T e S ments Da t e : 2019 Innovatint innovatint version SYSTEM 3 System REQUIREMENTS Requirements Date: 18-01-2019 Table of contents 1. Innovatint P.O.S 2 1.1 Minimal system requirements 2 1.2 Recommended system requirements

More information

Networks & protocols research in Grid5000 DAS3

Networks & protocols research in Grid5000 DAS3 1 Grid 5000 Networks & protocols research in Grid5000 DAS3 Date Pascale Vicat-Blanc Primet Senior Researcher at INRIA Leader of the RESO team LIP Laboratory UMR CNRS-INRIA-ENS-UCBL Ecole Normale Supérieure

More information

A Simple Mass Storage System for the SRB Data Grid

A Simple Mass Storage System for the SRB Data Grid A Simple Mass Storage System for the SRB Data Grid Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews San Diego Supercomputer Center SDSC/UCSD/NPACI Outline Motivations for implementing a Mass Storage

More information

Introduction to The Storage Resource Broker

Introduction to The Storage Resource Broker http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/ Policy for re-use This presentation can be re-used for academic

More information

Safe Replication and Data Staging

Safe Replication and Data Staging Safe Replication and Data Staging Core Services of the EUDAT CDI Johannes Reetz, RZG 2nd EUDAT conference Rome, 29 Oct 2013 2 enrichment processing reduction analysis domain of registered data individual

More information

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices Data Management 1 Grid data management Different sources of data Sensors Analytic equipment Measurement tools and devices Need to discover patterns in data to create information Need mechanisms to deal

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information

Cisco Meraki MX products come in 6 models. The chart below outlines MX hardware properties for each model:

Cisco Meraki MX products come in 6 models. The chart below outlines MX hardware properties for each model: MX Sizing Guide AUGUST 2016 This technical document provides guidelines for choosing the right Cisco Meraki security appliance based on real-world deployments, industry standard benchmarks and in-depth

More information

I/O: State of the art and Future developments

I/O: State of the art and Future developments I/O: State of the art and Future developments Giorgio Amati SCAI Dept. Rome, 18/19 May 2016 Some questions Just to know each other: Why are you here? Which is the typical I/O size you work with? GB? TB?

More information

Data Management Components for a Research Data Archive

Data Management Components for a Research Data Archive Data Management Components for a Research Data Archive Steven Worley and Bob Dattore Scientific Computing Division Computational and Information Systems Laboratory National Center for Atmospheric Research

More information

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 DVS, GPFS and External Lustre at NERSC How It s Working on Hopper Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 1 NERSC is the Primary Computing Center for DOE Office of Science NERSC serves

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles - European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles - Dr.-Ing. Morris Riedel Federated Systems and Data Juelich Supercomputing Centre (JSC) Adjunct Associate Professor School

More information

Securing Grid Data Transfer Services with Active Network Portals

Securing Grid Data Transfer Services with Active Network Portals Securing with Active Network Portals Onur Demir 1 2 Kanad Ghose 3 Madhusudhan Govindaraju 4 Department of Computer Science Binghamton University (SUNY) {onur 1, mike 2, ghose 3, mgovinda 4 }@cs.binghamton.edu

More information

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content 1st HellasGrid User Forum 10-11/1/2008 GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou School of ECE

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER Aspera FASP Data Transfer at 80 Gbps Elimina8ng tradi8onal bo

More information

Lecture 4: Introduction to Computer Network Design

Lecture 4: Introduction to Computer Network Design Lecture 4: Introduction to Computer Network Design Instructor: Hussein Al Osman Based on Slides by: Prof. Shervin Shirmohammadi Hussein Al Osman CEG4190 4-1 Computer Networks Hussein Al Osman CEG4190 4-2

More information

Server Specifications

Server Specifications Requirements Server s It is highly recommended that MS Exchange does not run on the same server as Practice Evolve. Server Minimum Minimum spec. is influenced by choice of operating system and by number

More information