Data Staging: Moving large amounts of data around, and moving it close to compute resources
|
|
- Theodore Ellis
- 6 years ago
- Views:
Transcription
1 Data Staging: Moving large amounts of data around, and moving it close to compute resources Digital Preserva-on Advanced Prac--oner Course Glasgow, July 19 th 2013
2 Definition Starting point Moving data around Size/Performances The user Data selection Transfer options Numbers Registered data Outline 2
3 Data movement: staging or replication? PID Safe Replica/on Data preservation and access optimization Data Staging Dynamic replication to HPC workspace for processing Data 3 3
4 Staging 4 4
5 Moving large amounts of data around Data sets can be large both in terms of Numbers of objects Single object size 5
6 All data are gray in the data staging The data types are irrelevant Except when they can affect the size or the number of the files hap:// 6
7 When I should care about data types? When you can compress them Data transfer efficiency is crucial in data staging 7
8 Time to copy 1TB 10 Mb/s network: 300 hrs (12.5 days) 100 Mb/s network: 30 hrs 1 Gb/s network: 3 hrs (are your disks fast enough?) 10 Gb/s network: 20 minutes (need really fast disks and file system) Compare these speeds to: USB 2.0 portable disk 60 MB/sec (480 Mbps) peak 20 MB/sec (160 Mbps) reported on line hours to load 1 Terabyte 8 Parallel I/O and management of large scien-fic data
9 Data Throughput Transfer Times This table available at 9 Parallel I/O and management of large scien-fic data
10 Data staging is Just in Time You do not plan it You can pre-stage sometimes There are techniques to improve the efficiency 10
11 Faster than light TCP tuning: refers to the proper configuration of buffers that correspond to TCP windowing Pipelining (of commands): speeds up lots of tiny files by stuffing multiple commands into each login session back-to-back without waiting for the first command's response 11
12 Parallelizing: on wide-area links, using multiple TCP streams in parallel (even between the same source and destination) can improve aggregate bandwidth over using a single TCP stream 12
13 Striping: data may be striped or interleaved across multiple servers 13
14 Parallelism and TCP tuning are the keys It is much easier to achieve a given performance level with four parallel connections than with one A good TCP tuning can improve drastically performances Latency interaction is critical Wide area data transfers have much higher latency than LAN transfers Many tools and protocols assume a LAN Examples: SCP/SFTP, HPSS mover protocol 14
15 Use the right tool SCP/SFTP: 10 Mb/s FTP: Mb/s assumes TCP buffer autotuning standard Unix file copy tools Parallel stream FTP: Mbps fixed 1 MB TCP window in OpenSSH only 64 KB in OpenSSH versions <
16 But efficiency is not all Easiness of use High reliability Third-party transfer Possibility to control/limit the transfer throughput to avoid engulfing the network Which priority? Different scenarios, different needs 16
17 and moving it close to compute resources 17
18 Who will move the data? Any user, part of a community or citizen scientist. In most part of the cases, a domain expert, such as a biologist, a linguist, a seismologist, not an expert of data transfer software. 18
19 Wizard gap As Ethernet speeds have increased, there has been a widening gap between ability of the novice to fully use bandwidth capabilities, compared to that of network capabilites achieved by users who have had their network tuned by an expert ('Wizard"). [Diagram c/o MaA Mathis, hap:// 19
20 Data selection Data Staging Overview 20
21 How the data sets are selected by the domain experts? Through their local tools, where local means on the community side Through remote services, generic or specific per community 21
22 Move the data 22
23 Eudat Transfer options Globus Online Third party transfer, web portal Globus client server to server to server irods specific client server to to server server HTTP options Work in progress 23
24 GridFTP: third Party Transfer Control channels GridFTP Server Data channels GridFTP Server Site A Site A 24
25 Globus OnLine Service 25
26 EUDAT PID Howto transfer #TB or PB From to HPC which site EUDAT X site Which A,B tool or C to use: GridFTP, Howto Scp, to get irods? access all sites? EUDAT site B GridFTP scp 1Gb/s WAN HPC site X EUDAT 10Gb/s WAN Globus Online irods 10Gb/s OPN EUDAT PID Bittorrent? EUDAT PID EUDAT site C EUDAT site A 26
27 Stage-out Data transport protocols: GridFTP, HTTP. SCP, FTP, RSYNC, OpenDap? WAN Which tool to manage the transfers: irods, Globus Online, FTS, RSYNC, Others.? PID EUDA T LTA EUDAT SP site A WAN Third party transfers Dedicated Network Third party transfers HPC site X HPC space EUDAT User Forum Barcelona, 7-8 March
28 Stage-out/stage-in Workflow framework Stage 1 Stage 2 Stage # PID EUDAT LTA EUDAT SP site A Data Mining cluster EUDAT User Forum Barcelona, 7-8 March
29 Multi-step staging EPCC EUDAT center Community center VPH ENES CLARIN Lifewatch CINECA BSC EPOS 29
30 Local staging targets 1. Single step 30
31 Sample Data Transfer Results Sample Results: RTT = 53 ms, network capacity = 10Gb/s. Tool Throughput scp: 140 Mb/s HPN patched scp: 1.2 Gb/s FTP: 1.4 Gb/s GridFTP, 4 streams 5.4 Gb/s GridFTP, 8 streams 6.6 Gb/s 31
32 irods irods is a data management system, which integrates a rule engine The rules can be triggered automatically based on specific events (for example a data set is moved to a particular directory) Invoked from remote via irods command line client or integrated into applications based on irods java (Jargon) and python (PyRods) API Data Staging Overview 32
33 Metadata! "Managing Large Datasets with irods a Performance Analysis" - Proceedings of the Interna-onal Mul-conference on Computer Science and Informa-on Technology pp
34 haps://openwiki.uninea.no/norstore:irods:workshop
35 Data Staging and registered data PID Data Center Store PID Data Center Store Data Center Store HPC ain of registered data 35
36 Conclusions The way to move data has to be enough flexible to accomodate different transfer protocols, different access mechanisms. Flexibility means also that the transfer tools can be used as they are, with default parameters, for average performances, but also fine tuned by experts for faster transfers. No solution fits all, so different services are provided 36
37 e- IRG Workshop - Dublin - Ireland 37 37
38 Thank you! 38
Data Staging: Moving large amounts of data around, and moving it close to compute resources
Data Staging: Moving large amounts of data around, and moving it close to compute resources PRACE advanced training course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari
More informationData Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013
Data Replication: Automated move and copy of data PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari c.cacciari@cineca.it Outline The issue
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands
EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni, Claudio Cacciari SuperComputing Centre, CINECA, Italy Peter Wittenburg, Daan Broeder The Language Archive, Max Planck Institute,
More informationEUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012
EUDAT Towards a Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012 Big (Chaotic) Data DATA GENERATORS 1) Measurement technology. 2) Cheap
More informationEUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members
EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members Agenda Background information Services Common Data Infrastructure
More informationUsing EUDAT services to replicate, store, share, and find cultural heritage data
Using EUDAT services to replicate, store, share, and find cultural heritage data in PSNC and beyond Maciej Brzeźniak, Norbert Meyer, PSNC Damien Lecarpentier, CSC 3 October 203 DIGITAL PRESERVATION OF
More informationFile Transfer: Basics and Best Practices. Joon Kim. Ph.D. PICSciE. Research Computing 09/07/2018
File Transfer: Basics and Best Practices Joon Kim. Ph.D. PICSciE Research Computing Workshop @Chemistry 09/07/2018 Our goal today Learn about data transfer basics Pick the right tool for your job Know
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationEUDAT & AAI. Daan Broeder MPI for Psycholinguistics
EUDAT & AAI Daan Broeder MPI for Psycholinguistics Initially six research communities on Board EPOS: European Plate Observatory System CLARIN: Common Language Resources and Technology Infrastructure ENES:
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure
EUDAT Towards a pan-european Collaborative Data Infrastructure Martin Hellmich Slides adapted from Damien Lecarpentier DCH-RP workshop, Manchester, 10 April 2013 Research Infrastructures Research Infrastructure
More informationEUDAT Common data infrastructure
EUDAT Common data infrastructure Giuseppe Fiameni SuperComputing Applications and Innovation CINECA Italy Peter Wittenburg Max Planck Institute for Psycholinguistics Nijmegen, Netherlands some major characteristics
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationParallel File Systems. John White Lawrence Berkeley National Lab
Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation
More informationEUDAT Towards a Collaborative Data Infrastructure
EUDAT Towards a Collaborative Data Infrastructure Daan Broeder - MPI for Psycholinguistics - EUDAT - CLARIN - DASISH Bielefeld 10 th International Conference Data These days it is so very easy to create
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona
EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona Date: 7 March 2012 EUDAT Key facts Content Project Name
More informationProMAX Cache-A Overview
ProMAX Cache-A Overview Cache-A Archive Appliances With the proliferation of tapeless workflows, came an amazing amount of data that not only needs to be backed up, but also moved and managed. This data
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure
EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni (g.fiameni@cineca.it) Claudio Cacciari SuperComputing, Application and Innovation CINECA Johannes Reatz RZG, Germany Damien
More informationData Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY
Data Staging and Data Movement with EUDAT Course Introduction Helsinki 10 th -12 th September, 2013 1400 Introduction Adam Carter Course Timetable TODAY 1430 The EUDAT Safe Replication Service Claudio
More informationUltra high-speed transmission technology for wide area data movement
Ultra high-speed transmission technology for wide area data movement Michelle Munson, president & co-founder Aspera Outline Business motivation Moving ever larger file sets over commodity IP networks (public,
More informationI data set della ricerca ed il progetto EUDAT
I data set della ricerca ed il progetto EUDAT Casalecchio di Reno (BO) Via Magnanelli 6/3, 40033 Casalecchio di Reno 051 6171411 www.cineca.it 1 Digital as a Global Priority 2 Focus on research data Square
More informationNCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017
NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis,
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -
EUDAT Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? - Damien Lecarpentier CSC-IT Center for Science, Finland NeIC Conference Trondheim, 16 May 2013 Data trends Exponential
More informationEUDAT. Towards a pan-european Collaborative Data Infrastructure
EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland CESSDA workshop Tampere, 5 October 2012 EUDAT Towards a pan-european Collaborative
More informationAutomatically Encapsulating HPC Best Practices Into Data Transfers
National Aeronautics and Space Administration Automatically Encapsulating HPC Best Practices Into Data Transfers Paul Z. Kolano NASA Advanced Supercomputing Division paul.kolano@nasa.gov www.nasa.gov Introduction
More informationUSE CASES IN SEISMOLOGY. Alberto Michelini INGV
USE CASES IN SEISMOLOGY Alberto Michelini INGV PROJECTS NERA (Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation) VERCE (Virtual Earthquake and seismology Research
More informationTable 9. ASCI Data Storage Requirements
Table 9. ASCI Data Storage Requirements 1998 1999 2000 2001 2002 2003 2004 ASCI memory (TB) Storage Growth / Year (PB) Total Storage Capacity (PB) Single File Xfr Rate (GB/sec).44 4 1.5 4.5 8.9 15. 8 28
More informationComputer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research
Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya
More informationData Discovery - Introduction
Data Discovery - Introduction Why (benefits of reusing data) How EUDAT's services help with this (in general) Adam Carter In days gone by: Design an experiment Getting Your Data Conduct the experiment
More informationXSEDE Software and Services Table For Service Providers and Campus Bridging
XSEDE Software and Services Table For Service Providers and Campus Bridging 19 February 2013 Version 1.1 Page i Table of Contents A. Document History iv B. Document Scope v C. 1 Page ii List of Figures
More informationOutline. ASP 2012 Grid School
Distributed Storage Rob Quick Indiana University Slides courtesy of Derek Weitzel University of Nebraska Lincoln Outline Storage Patterns in Grid Applications Storage
More informationThe Internet. Session 3 INST 301 Introduction to Information Science
The Internet Session 3 INST 301 Introduction to Information Science Outline The creation story What it is Exploring it Using it Outline The creation story What it is Exploring it Using it Source: Wikipedia
More informationProfiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education
More informationQoS on Low Bandwidth High Delay Links. Prakash Shende Planning & Engg. Team Data Network Reliance Infocomm
QoS on Low Bandwidth High Delay Links Prakash Shende Planning & Engg. Team Data Network Reliance Infocomm Agenda QoS Some Basics What are the characteristics of High Delay Low Bandwidth link What factors
More informationEMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning
EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning Abstract This white paper describes how to configure the Celerra IP storage system
More informationData transfer at CINECA: how to enjoy it!
Data transfer at CINECA: how to enjoy it! Giacomo Mariani g.mariani@cineca.it Bologna 03/05/2011 www.cineca.it Table of contents Resources What we have Use Case What users need Tools How we support them
More informationIsilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
More informationODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team
B2Safe @ ODC and future EIDA/ EPOS-S plans within EUDAT2020 Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team 3rd Conference, Amsterdam, The Netherlands, 24-25 September 2014
More informationOptimizing reasonably secure Long Distance Data Transfer. How to transfer Data while being poor
Manfred Stolle Zuse Institute Berlin Optimizing reasonably secure Long Distance Data Transfer Or How to transfer Data while being poor 1 The situation A scientific team gains on a daily basis 250 GB of
More informationIBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide
V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication
More informationA data Grid testbed environment in Gigabit WAN with HPSS
Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California A data Grid testbed environment in Gigabit with HPSS Atsushi Manabe, Setsuya Kawabata, Youhei Morita, Takashi Sasaki,
More informationExtreme I/O Scaling with HDF5
Extreme I/O Scaling with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group koziol@hdfgroup.org July 15, 2012 XSEDE 12 - Extreme Scaling Workshop 1 Outline Brief overview of
More informationNew User Seminar: Part 2 (best practices)
New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency
More informationOvercoming Obstacles to Petabyte Archives
Overcoming Obstacles to Petabyte Archives Mike Holland Grau Data Storage, Inc. 609 S. Taylor Ave., Unit E, Louisville CO 80027-3091 Phone: +1-303-664-0060 FAX: +1-303-664-1680 E-mail: Mike@GrauData.com
More informationServer Specifications
Requirements Server s It is highly recommended that MS Exchange does not run on the same server as Practice Evolve. Server Minimum Minimum spec. is influenced by choice of operating system and by number
More informationEUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT
EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts
More informationEnterprise print management in VMware Horizon
Enterprise print management in VMware Horizon Introduction: Embracing and Extending VMware Horizon Tricerat Simplify Printing enhances the capabilities of VMware Horizon environments by enabling reliable
More informationZhengyang Liu University of Virginia. Oct 29, 2012
SDCI Net: Collaborative Research: An integrated study of datacenter networking and 100 GigE wide-area networking in support of distributed scientific computing Zhengyang Liu University of Virginia Oct
More informationLayer 4 TCP Performance in carrier transport networks Throughput is not always equal to bandwidth
Layer 4 TCP Performance in carrier transport networks Throughput is not always equal to bandwidth Roland Stooss JDSU Deutschland GmbH Senior Consultant Data/IP Analysis Solutions Mühleweg 5, D-72800 Eningen
More informationRESEARCH DATA DEPOT AT PURDUE UNIVERSITY
Preston Smith Director of Research Services RESEARCH DATA DEPOT AT PURDUE UNIVERSITY May 18, 2016 HTCONDOR WEEK 2016 Ran into Miron at a workshop recently.. Talked about data and the challenges of providing
More informationZhengyang Liu! Oct 25, Supported by NSF Grant OCI
SDCI Net: Collaborative Research: An integrated study of datacenter networking and 100 GigE wide-area networking in support of distributed scientific computing Zhengyang Liu! Oct 25, 2013 Supported by
More informationData Transfers in the Grid: Workload Analysis of Globus GridFTP
Data Transfers in the Grid: Workload Analysis of Globus GridFTP Nicolas Kourtellis, Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi University of South Florida Dan Fraser Argonne National Laboratory Objective
More informationEUDAT - Open Data Services for Research
EUDAT - Open Data Services for Research Johannes Reetz EUDAT operations Max Planck Computing & Data Centre Science Operations Workshop 2015 ESO, Garching 24-27th November 2015 EUDAT receives funding from
More informationXSEDE Software and Services Table For Service Providers and Campus Bridging
XSEDE Software and Services Table For Service Providers and Campus Bridging 24 September 2015 Version 1.4 Page i Table of Contents A. Document History iv B. Document Scope v C. 1 Page ii List of Figures
More informationTHE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid
THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The
More informationInternet Content Distribution
Internet Content Distribution Chapter 1: Introduction Jussi Kangasharju Chapter Outline Introduction into content distribution Basic concepts TCP DNS HTTP Outline of the rest of the course Kangasharju:
More informationGrid and Component Technologies in Physics Applications
Grid and Component Technologies in Physics Applications Svetlana Shasharina Nanbor Wang, Stefan Muszala and Roopa Pundaleeka. sveta@txcorp.com Tech-X Corporation Boulder, CO ICALEPCS07, October 15, 2007
More informationEfficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library
Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library Authors Devresse Adrien (CERN) Fabrizio Furano (CERN) Typical HPC architecture Computing Cluster
More informationSSH Bulk Transfer Performance. Allan Jude --
SSH Bulk Transfer Performance Allan Jude -- allanjude@freebsd.org Introduction 15 Years as FreeBSD Server Admin FreeBSD src/doc committer (ZFS, bhyve, ucl, xo) FreeBSD Core Team (July 2016-2018) Co-Author
More informationEnabling High Performance Bulk Data Transfers With SSH
Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin Bennett TIP 08 Moving Data Still crazy after all these years Multiple solutions exist Protocols UDT, SABUL, etc Implementations
More informationMX Sizing Guide. 4Gon Tel: +44 (0) Fax: +44 (0)
MX Sizing Guide FEBRUARY 2015 This technical document provides guidelines for choosing the right Cisco Meraki security appliance based on real-world deployments, industry standard benchmarks and in-depth
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationEUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd
EUDAT Data Services & Tools for Researchers and Communities Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd CSC IT CENTER FOR SCIENCE! Founded in 1971 as a technical support
More informationAn Overview of Fujitsu s Lustre Based File System
An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu
More informationBuilding a Materials Data Facility (MDF)
Building a Materials Data Facility (MDF) Ben Blaiszik (blaiszik@uchicago.edu) Ian Foster (foster@uchicago.edu) Steve Tuecke, Kyle Chard, Rachana Ananthakrishnan, Jim Pruyne (UC) Kandace Turner-Jones, John
More informationCS60021: Scalable Data Mining. Sourangshu Bhattacharya
CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer
More informationLCG data management at IN2P3 CC FTS SRM dcache HPSS
jeudi 26 avril 2007 LCG data management at IN2P3 CC FTS SRM dcache HPSS Jonathan Schaeffer / Lionel Schwarz dcachemaster@cc.in2p3.fr dcache Joint development by FNAL and DESY Cache disk manager with unique
More informationGRID AND COMPONENT TECHNOLOGIES IN PHYSICS APPLICATIONS
GRID AND COMPONENT TECHNOLOGIES IN PHYSICS APPLICATIONS S. Shasharina, N.Wang, S. Muszala, R. Pundaleeka, Tech-X Corporation, Boulder, CO 80303, USA Abstract Recent advances in computer science have made
More informationA High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research
A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research Storage Platforms with Aspera Overview A growing number of organizations with data-intensive
More informationUsing SDSC Systems (part 2)
Using SDSC Systems (part 2) Running vsmp jobs, Data Transfer, I/O SDSC Summer Institute August 6-10 2012 Mahidhar Tatineni San Diego Supercomputer Center " 1 vsmp Runtime Guidelines: Overview" Identify
More informationApproaches for Upgrading to SAS 9.2. CHAPTER 1 Overview of Migrating Content to SAS 9.2
1 CHAPTER 1 Overview of Migrating Content to SAS 9.2 Approaches for Upgrading to SAS 9.2 1 What is Promotion? 2 Promotion Tools 2 What Can Be Promoted? 2 Special Considerations for Promoting Metadata From
More informationA Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer
A Crash Course In Wide Area Data Replication Jacob Farmer, CTO, Cambridge Computer SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals
More informationParallel Storage Systems for Large-Scale Machines
Parallel Storage Systems for Large-Scale Machines Doctoral Showcase Christos FILIPPIDIS (cfjs@outlook.com) Department of Informatics and Telecommunications, National and Kapodistrian University of Athens
More informationProtect enterprise data, achieve long-term data retention
Technical white paper Protect enterprise data, achieve long-term data retention HP StoreOnce Catalyst and Symantec NetBackup OpenStorage Table of contents Introduction 2 Technology overview 3 HP StoreOnce
More informationThe Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)
Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov
More informationStudent ID: CS457: Computer Networking Date: 3/20/2007 Name:
CS457: Computer Networking Date: 3/20/2007 Name: Instructions: 1. Be sure that you have 9 questions 2. Be sure your answers are legible. 3. Write your Student ID at the top of every page 4. This is a closed
More informationGPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009
GPFS on a Cray XT Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009 Outline NERSC Global File System GPFS Overview Comparison of Lustre and GPFS
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationBeyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center
Beyond Petascale Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center GPFS Research and Development! GPFS product originated at IBM Almaden Research Laboratory! Research continues to
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More information201 9 Iin n n n o o v v a a t tiin n t t S v Y e S rstie o M n 3 R S E ysq t U e IR m E R M eq E un ir T e S ments Da t e :
2019 Innovatint innovatint version SYSTEM 3 System REQUIREMENTS Requirements Date: 18-01-2019 Table of contents 1. Innovatint P.O.S 2 1.1 Minimal system requirements 2 1.2 Recommended system requirements
More informationNetworks & protocols research in Grid5000 DAS3
1 Grid 5000 Networks & protocols research in Grid5000 DAS3 Date Pascale Vicat-Blanc Primet Senior Researcher at INRIA Leader of the RESO team LIP Laboratory UMR CNRS-INRIA-ENS-UCBL Ecole Normale Supérieure
More informationA Simple Mass Storage System for the SRB Data Grid
A Simple Mass Storage System for the SRB Data Grid Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews San Diego Supercomputer Center SDSC/UCSD/NPACI Outline Motivations for implementing a Mass Storage
More informationIntroduction to The Storage Resource Broker
http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/ Policy for re-use This presentation can be re-used for academic
More informationSafe Replication and Data Staging
Safe Replication and Data Staging Core Services of the EUDAT CDI Johannes Reetz, RZG 2nd EUDAT conference Rome, 29 Oct 2013 2 enrichment processing reduction analysis domain of registered data individual
More informationData Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices
Data Management 1 Grid data management Different sources of data Sensors Analytic equipment Measurement tools and devices Need to discover patterns in data to create information Need mechanisms to deal
More informationHigh Throughput WAN Data Transfer with Hadoop-based Storage
High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San
More informationCisco Meraki MX products come in 6 models. The chart below outlines MX hardware properties for each model:
MX Sizing Guide AUGUST 2016 This technical document provides guidelines for choosing the right Cisco Meraki security appliance based on real-world deployments, industry standard benchmarks and in-depth
More informationI/O: State of the art and Future developments
I/O: State of the art and Future developments Giorgio Amati SCAI Dept. Rome, 18/19 May 2016 Some questions Just to know each other: Why are you here? Which is the typical I/O size you work with? GB? TB?
More informationData Management Components for a Research Data Archive
Data Management Components for a Research Data Archive Steven Worley and Bob Dattore Scientific Computing Division Computational and Information Systems Laboratory National Center for Atmospheric Research
More informationDVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011
DVS, GPFS and External Lustre at NERSC How It s Working on Hopper Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 1 NERSC is the Primary Computing Center for DOE Office of Science NERSC serves
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationEuropean Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -
European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles - Dr.-Ing. Morris Riedel Federated Systems and Data Juelich Supercomputing Centre (JSC) Adjunct Associate Professor School
More informationSecuring Grid Data Transfer Services with Active Network Portals
Securing with Active Network Portals Onur Demir 1 2 Kanad Ghose 3 Madhusudhan Govindaraju 4 Department of Computer Science Binghamton University (SUNY) {onur 1, mike 2, ghose 3, mgovinda 4 }@cs.binghamton.edu
More informationGridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content
1st HellasGrid User Forum 10-11/1/2008 GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou School of ECE
More informationASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed
ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER Aspera FASP Data Transfer at 80 Gbps Elimina8ng tradi8onal bo
More informationLecture 4: Introduction to Computer Network Design
Lecture 4: Introduction to Computer Network Design Instructor: Hussein Al Osman Based on Slides by: Prof. Shervin Shirmohammadi Hussein Al Osman CEG4190 4-1 Computer Networks Hussein Al Osman CEG4190 4-2
More informationServer Specifications
Requirements Server s It is highly recommended that MS Exchange does not run on the same server as Practice Evolve. Server Minimum Minimum spec. is influenced by choice of operating system and by number
More information