A Reproducible Framework Powered by Globus Tanu Malik, Kyle Chard*, Ian Foster

Size: px
Start display at page:

Download "A Reproducible Framework Powered by Globus Tanu Malik, Kyle Chard*, Ian Foster"

Transcription

1 A Reproducible Framework Powered by Globus Tanu Malik, Kyle Chard*, Ian Foster Computation Institute University of Chicago and Argonne National Laboratory

2 Sharing and Reproducibility Alice wants to share her models and simulation output with Bob Bob wants to re-execute Alice s application to validate her analysis

3 Current Approaches Compressed archive (zip, tar) Metadata encoded in filenames, paths, and README files Shared user accounts Ad hoc websites with model code, parameters, and data Submission to domain or institutional repositories Creation and sharing of virtual machines

4 User Frustrations I cannot find the lib.so required to build the model I can t find input data to run the model I don t know the parameters used to execute the analysis Amount of pain Bob suffers Amount of pain Alice suffers

5 User Frustrations I cannot find the lib.so required to build the model I can t find input data to run the model I don t know the parameters used to execute the analysis Lack of easy and efficient methods for sharing and Amount of pain Bob suffers reproducibility Amount of pain Alice suffers

6 Reproducibility Requirements Automatically solve dependency hell E.g., incompatible library versions Connect programs with data and capture data flow Which version of the program produced this data? Support annotation of human knowledge Sufficient documentation to install and run the program Enable reproducibility efficiently and with minimal intervention No change of programming or authoring environments

7 Reproducibility Framework Machine A data Source code Application Globus Catalog (Docker Hub, GitHub,DataHub) System files (/bin, /lib, ) Parameters configuration 1 Network connections 2 SciUnit 3 3 Share/ Transfer Machine B Globus Data Publication SciUnit 4 Execution Platform (Docker, PTU, chroot) Capture scientific activity Source code, data, environment, including data flow between processes Preserve as SciUnit Capture as files or as detailed metadata Share and distribute Re-execute and re-analyze

8 Components SciUnits Units of scientific activity/research output Metadata catalog A scalable & flexible cloud-based catalog for creating datasets and associating annotations Globus services for sharing, transferring, and publishing SciUnits Replay capability through native re-execution, Docker or Vagrant Run SciUnits without installation or configuration and metadata information

9 Simplifying Data Management for Geoscience Models Tanu Malik, Ian Foster, Kyle Chard, Joseph Baker, Mike Gurnis, Jonathan Goodall, Scott Peckham

10 Science Drivers Solid Earth Hydrology Space Science CSDMS

11 Science Usecases Solid earth: geounits of 2D and 3D kinematic geoscience models, visualize through GPlates and modifying GPML data files Goal: sharing, preserving, publishing visualization sessions with data Space Science: geounits of SuperDARN data with analysis tools as available from the Baker Lab at VT Goal: sharing and publishing geounits Hydrology: geounits of irods workflows on VIC models Goal: demonstrate end-to-end reproducibility with irods

12

13 Thank You!

Visualization for Scientists. We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.

Visualization for Scientists. We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape. Visualization for Scientists We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.org Transfer and synchronize files Easy fire-and-forget transfers

More information

Globus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018

Globus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018 Globus Platform Services for Data Publication Greg Nawrocki greg@globus.org University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018 Outline Globus Overview Globus Data Publication v1 Lessons

More information

Data publication and discovery with Globus

Data publication and discovery with Globus Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul, Ryan Chard, Kyle Chard, Steven Tuecke, Ali R. Butt, Ian Foster Virginia Tech, Argonne National

More information

Design patterns for data-driven research acceleration

Design patterns for data-driven research acceleration Design patterns for data-driven research acceleration Rachana Ananthakrishnan, Kyle Chard, and Ian Foster The University of Chicago and Argonne National Laboratory Contact: rachana@globus.org Introduction

More information

Parsl: Developing Interactive Parallel Workflows in Python using Parsl

Parsl: Developing Interactive Parallel Workflows in Python using Parsl Parsl: Developing Interactive Parallel Workflows in Python using Parsl Kyle Chard (chard@uchicago.edu) Yadu Babuji, Anna Woodard, Zhuozhao Li, Ben Clifford, Ian Foster, Dan Katz, Mike Wilde, Justin Wozniak

More information

Building a Materials Data Facility (MDF)

Building a Materials Data Facility (MDF) Building a Materials Data Facility (MDF) Ben Blaiszik (blaiszik@uchicago.edu) Ian Foster (foster@uchicago.edu) Steve Tuecke, Kyle Chard, Rachana Ananthakrishnan, Jim Pruyne (UC) Kandace Turner-Jones, John

More information

Exercise 6a: Using free and/or open source tools to build workflows to manipulate. LAStools

Exercise 6a: Using free and/or open source tools to build workflows to manipulate. LAStools Exercise 6a: Using free and/or open source tools to build workflows to manipulate and process LiDAR data: LAStools Christopher Crosby Last Revised: December 1st, 2009 Exercises in this series: 1. LAStools

More information

Mathematics and Computer Science Division. Department of Agricultural and Biological Engineering

Mathematics and Computer Science Division. Department of Agricultural and Biological Engineering Mathematics and Computer Science Division Department of Science and Technologies University of Naples Parthenope FACE-IT: Earth science workflows made easy with Globus and Galaxy technologies (Provide

More information

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam Agenda Introduction Challenges Data Transfer Solution irods use in Data Transfer Solution irods Proof-of-Concept Q&A Introduction

More information

The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures

The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures Samuel D. Gasster, Craig A. Lee, Brooks Davis, Matt Clark, Mike AuYeung, John R. Wilson Computer Systems

More information

Climate Data Management using Globus

Climate Data Management using Globus Climate Data Management using Globus Computation Institute Rachana Ananthakrishnan (ranantha@uchicago.edu) Data Management Challenges Transfers often take longer than expected based on available network

More information

Developing a Research Data Policy

Developing a Research Data Policy Developing a Research Data Policy Core Elements of the Content of a Research Data Management Policy This document may be useful for defining research data, explaining what RDM is, illustrating workflows,

More information

T2C2. A timely and trusted curator and coordinator of scientific data

T2C2. A timely and trusted curator and coordinator of scientific data Accelerating Science via Smart and Joint Cyber-Infrastructure for Materials and Semiconductor Fabrication Data and Metadata Klara Nahrstedt University of Illinois at Urbana-Champaign A timely and trusted

More information

SharePoint 2010 Enterprise Content Management for IT Pros. Mirjam van Olst Macaw

SharePoint 2010 Enterprise Content Management for IT Pros. Mirjam van Olst Macaw SharePoint 2010 Enterprise Content Management for IT Pros Mirjam van Olst Macaw About Mirjam Blog: http://sharepointchick.com Email: mirjam@macaw.nl Twitter: @mirjamvanolst Agenda Managed Metadata Service

More information

Geospatial Records and Your Archives

Geospatial Records and Your Archives The collections of scientific data acquired with government and private support are the foundation for our understanding of the physical world and for our capabilities to predict changes in that world.

More information

Running user-defined functions in R on Earth observation data in cloud back-ends

Running user-defined functions in R on Earth observation data in cloud back-ends Running user-defined functions in R on Earth observation data in cloud back-ends Pramit Ghosh, Florian Lahn, Sören Gebbert, Matthias Mohr and Edzer Pebesma Institute for Geoinformatics, University of Münster

More information

FuncX: A Function Serving Platform for HPC. Ryan Chard 28 Jan 2019

FuncX: A Function Serving Platform for HPC. Ryan Chard 28 Jan 2019 FuncX: A Function Serving Platform for HPC Ryan Chard 28 Jan 2019 Outline - Motivation FuncX: FaaS for HPC Implementation status Preliminary applications - Machine learning inference Automating analysis

More information

Archive II. The archive. 26/May/15

Archive II. The archive. 26/May/15 Archive II The archive 26/May/15 What is an archive? Is a service that provides long-term storage and access of data. Long-term usually means ~5years or more. Archive is strictly not the same as a backup.

More information

A Data Diffusion Approach to Large Scale Scientific Exploration

A Data Diffusion Approach to Large Scale Scientific Exploration A Data Diffusion Approach to Large Scale Scientific Exploration Ioan Raicu Distributed Systems Laboratory Computer Science Department University of Chicago Joint work with: Yong Zhao: Microsoft Ian Foster:

More information

Useful storage options and research tools

Useful storage options and research tools Useful storage options and research tools Experiment with available tools and services that make research so much easier 08/11/2018 Ashley Rustin Senior Technical Specialist Core Infrastructure Services

More information

Research Elsevier

Research Elsevier Research Data @ Elsevier From generation through sharing and publishing to discovery IJsbrand Jan Aalbersberg SVP Journal and Data Solutions NDS, Boulder - June 12, 2014 Contributors: Anita de Waard Hylke

More information

Oracle Database 11g: New Features for Administrators DBA Release 2

Oracle Database 11g: New Features for Administrators DBA Release 2 Oracle Database 11g: New Features for Administrators DBA Release 2 Duration: 5 Days What you will learn This Oracle Database 11g: New Features for Administrators DBA Release 2 training explores new change

More information

Preservation Metadata Extraction and Collection : Tools and Techniques. Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa

Preservation Metadata Extraction and Collection : Tools and Techniques. Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa How to get what you need to keep what you ve got The stack

More information

High Performance Computing Course Notes Grid Computing I

High Performance Computing Course Notes Grid Computing I High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are

More information

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality

More information

DSpace Fedora. Eprints Greenstone. Handle System

DSpace Fedora. Eprints Greenstone. Handle System Enabling Inter-repository repository Access Management between irods and Fedora Bing Zhu, Uni. of California: San Diego Richard Marciano Reagan Moore University of North Carolina at Chapel Hill May 18,

More information

Release notes for version 3.9.2

Release notes for version 3.9.2 Release notes for version 3.9.2 What s new Overview Here is what we were focused on while developing version 3.9.2, and a few announcements: Continuing improving ETL capabilities of EasyMorph by adding

More information

Research Data Management & Preservation: A Library Perspective

Research Data Management & Preservation: A Library Perspective Research Data Management & Preservation: A Library Perspective Brian Owen, Associate University Librarian Library Technology Services & Special Collections, Simon Fraser University Library LIBRARIES &

More information

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team B2Safe @ ODC and future EIDA/ EPOS-S plans within EUDAT2020 Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team 3rd Conference, Amsterdam, The Netherlands, 24-25 September 2014

More information

Who is Docker and how he can help us? Heino Talvik

Who is Docker and how he can help us? Heino Talvik Who is Docker and how he can help us? Heino Talvik heino.talvik@seb.ee heino.talvik@gmail.com What is Docker? Software guy view: Marriage of infrastucture and Source Code Management Hardware guy view:

More information

CARE: the Comprehensive Archiver for Reproducible Execution

CARE: the Comprehensive Archiver for Reproducible Execution Introducing CARE CARE: the Comprehensive Archiver for Reproducible Execution STMicroelectronics, Grenoble, France TRUST 14 June 12th, 2014 Outline Introducing CARE 1 Introducing CARE 2 Typical use cases

More information

Metadata Workshop 3 March 2006 Part 1

Metadata Workshop 3 March 2006 Part 1 Metadata Workshop 3 March 2006 Part 1 Metadata overview and guidelines Amelia Breytenbach Ria Groenewald What metadata is Overview Types of metadata and their importance How metadata is stored, what metadata

More information

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments...

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments... From Implementing CDISC Using SAS. Full book available for purchase here. Contents About This Book... xi About The Authors... xvii Acknowledgments... xix Chapter 1: Implementation Strategies... 1 Why CDISC

More information

Store and Report Waters Empower Data with OpenLAB ECM and OpenLAB ECM Intelligent Reporter

Store and Report Waters Empower Data with OpenLAB ECM and OpenLAB ECM Intelligent Reporter Store and Report Waters Empower Data with OpenLAB ECM and OpenLAB ECM Intelligent Reporter Matthias Rupp Solution Architect Laboratory Informatics 1 OpenLAB ECM - Content Management Capabilities Centrally

More information

Geospatial Multistate Archive and Preservation Partnership Metadata Comparison

Geospatial Multistate Archive and Preservation Partnership Metadata Comparison Geospatial Multistate Archive and Preservation Partnership Metadata Comparison Cindy Clark, Utah s Automated Geographic Reference Center Glen McAninch, Kentucky Dept. for Libraries and Archives Steve Morris,

More information

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego Scientific Workflow Tools Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego 1 escience Today Increasing number of Cyberinfrastructure (CI) technologies Data Repositories: Network

More information

Globus Online and HPSS. KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che

Globus Online and HPSS. KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che Globus Online and HPSS KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che Agenda (1) What is Globus and Globus Online? How Globus Online works? Globus DSI module for HPSS Globus Online setup DSI module

More information

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The

More information

SERVO - ACES Abstract

SERVO - ACES Abstract 1 of 6 12/27/2004 2:33 PM 2 of 6 12/27/2004 2:33 PM Implementing GIS Grid Services for the International Solid Earth Research Virtual Observatory Galip Aydin (1), Marlon Pierce (1), Geoffrey Fox (1), Mehmet

More information

Data Management Components for a Research Data Archive

Data Management Components for a Research Data Archive Data Management Components for a Research Data Archive Steven Worley and Bob Dattore Scientific Computing Division Computational and Information Systems Laboratory National Center for Atmospheric Research

More information

An Introduction to Using Lidar with ArcGIS and 3D Analyst

An Introduction to Using Lidar with ArcGIS and 3D Analyst FedGIS Conference February 24 25, 2016 Washington, DC An Introduction to Using Lidar with ArcGIS and 3D Analyst Jim Michel Outline Lidar Intro Lidar Management Las files Laz, zlas, conversion tools Las

More information

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski Using Resources of Multiple Grids with the Grid Service Provider Micha?Kosiedowski Grid Service Provider The Grid Service Provider came as a result of research done within the PROGRESS project: Project

More information

NS Package Specification V0.1

NS Package Specification V0.1 NS Package Specification V0.1 Catalog Catalog...2 1. Scope...3 2. Terms, Definitions and Abbreviations... 3 3. NS CSAR Model Definition...3 3.1 OASIS CSAR Introduction... 3 3.2 NS CSAR Model Structure...

More information

Shifter at CSCS Docker Containers for HPC

Shifter at CSCS Docker Containers for HPC Shifter at CSCS Docker Containers for HPC HPC Advisory Council Swiss Conference Alberto Madonna, Lucas Benedicic, Felipe A. Cruz, Kean Mariotti - CSCS April 9 th, 2018 Table of Contents 1. Introduction

More information

Oracle Database 11g: New Features for Administrators Release 2

Oracle Database 11g: New Features for Administrators Release 2 Oracle University Contact Us: 0845 777 7711 Oracle Database 11g: New Features for Administrators Release 2 Duration: 5 Days What you will learn This course gives you the opportunity to learn about and

More information

ORACLE 11g R2 New Features

ORACLE 11g R2 New Features KNOWLEDGE POWER Oracle Grid Infrastructure Installation and Upgrade Enhancements Oracle Restart ASM Enhancements Storage Enhancements Data Warehouse and Partitioning Enhancements Oracle SecureFiles Security

More information

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences Vicki L. Ferrini, Kerstin A. Lehnert, Suzanne M. Carbotte, and Leslie Hsu Lamont-Doherty Earth Observatory What is

More information

CS 470 Spring Virtualization and Cloud Computing. Mike Lam, Professor. Content taken from the following:

CS 470 Spring Virtualization and Cloud Computing. Mike Lam, Professor. Content taken from the following: CS 470 Spring 2018 Mike Lam, Professor Virtualization and Cloud Computing Content taken from the following: A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, 9 th Edition (Chapter

More information

Agile Data Management Challenges in Enterprise Big Data Landscape

Agile Data Management Challenges in Enterprise Big Data Landscape Agile Data Management Challenges in Enterprise Big Data Landscape Eric Simon, SAP Big Data October, 2017 1 Evolution Towards Enterprise Big Data Landscape administrator Data analyst Athena Redshift #123

More information

City of Bartlett. Request for Information. Document Management Solution

City of Bartlett. Request for Information. Document Management Solution City of Bartlett Request for Information Document Management Solution August 12, 2010 Dear Vendor: August 12, 2010 The City of Bartlett is actively pursuing a solution for Document Management. Included

More information

Academic Workflow for Research Repositories Using irods and Object Storage

Academic Workflow for Research Repositories Using irods and Object Storage 1 Academic Workflow for Research Repositories Using irods and Object Storage 2016 irods User s Group Meeting 9 June 2016 Randall Splinter, Ph.D. HPC Research Computing Solutions Architect RSplinter@ 770.633.2994

More information

Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers

Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers Reproducibility and Reuse of Scientific Code Evolving the Role and Capabilities of Publishers Michael Forster

More information

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability Using in a Semantic Web Approach for Improved Earth Science Data Usability Rahul Ramachandran, Helen Conover, Sunil Movva and Sara Graves Information Technology and Systems Center University of Alabama

More information

SNIA 100 Year Archive Survey 2017 Thomas Rivera, CISSP

SNIA 100 Year Archive Survey 2017 Thomas Rivera, CISSP SNIA 100 Year Archive Survey 2017 Thomas Rivera, CISSP Data Security & Privacy Consultant Co-chair, Data Protection & Capacity Optimization (DPCO)Committee SNIA Secretary, Board of Directors SNIA Secretary,

More information

Technical Whitepaper. NetBackup PureDisk Technical Product Management. PureDisk Remote Office Protection. Export to NetBackup Feature

Technical Whitepaper. NetBackup PureDisk Technical Product Management. PureDisk Remote Office Protection. Export to NetBackup Feature Technical Whitepaper NetBackup PureDisk Technical Product Management PureDisk Remote Office Protection Export to NetBackup Feature 09 May 2007 Document Information Copyright The copyright to this document

More information

Digital The Harold B. Lee Library

Digital The Harold B. Lee Library Digital Preservation @ The Harold B. Lee Library CIMA 23 May 2013 How we got here? 1. Understanding Digital Preservation 2. Search for Content 3. Maintain Optical Disc Storage 4. In House Preservation

More information

How to contribute information to AGRIS

How to contribute information to AGRIS How to contribute information to AGRIS Guidelines on how to complete your registration form The dashboard includes information about you, your institution and your collection. You are welcome to provide

More information

PharmaSUG Paper AD03

PharmaSUG Paper AD03 PharmaSUG 2017 - Paper AD03 Three Issues and Corresponding Work-Around Solution for Generating Define.xml 2.0 Using Pinnacle 21 Enterprise Jeff Xia, Merck & Co., Inc., Rahway, NJ, USA Lugang (Larry) Xie,

More information

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul Virginia Tech Steven Tuecke University of Chicago Ryan Chard Argonne National Laboratory Ali R.

More information

The Earth System Grid: A Visualisation Solution. Gary Strand

The Earth System Grid: A Visualisation Solution. Gary Strand The Earth System Grid: A Visualisation Solution Gary Strand Introduction Acknowledgments PI s Ian Foster (ANL) Don Middleton (NCAR) Dean Williams (LLNL) ESG Development Team Veronika Nefedova (ANL) Ann

More information

The iplant Data Commons

The iplant Data Commons The iplant Data Commons Using irods to Facilitate Data Dissemination, Discovery, and Reproducibility Jeremy DeBarry, jdebarry@iplantcollaborative.org Tony Edgin, tedgin@iplantcollaborative.org Nirav Merchant,

More information

Data Management: the What, When and How

Data Management: the What, When and How Data Management: the What, When and How Data Management: the What DAMA(Data Management Association) states that "Data Resource Management is the development and execution of architectures, policies, practices

More information

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System Jianwu Wang, Ilkay Altintas Scientific Workflow Automation Technologies Lab SDSC, UCSD project.org UCGrid Summit,

More information

CloudCenter for Developers

CloudCenter for Developers DEVNET-1198 CloudCenter for Developers Conor Murphy, Systems Engineer Data Centre Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the

More information

The Earth System Grid Discovery and Semantic Web Technologies

The Earth System Grid Discovery and Semantic Web Technologies The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard (Oak Ridge National Laboratory, Oak Ridge, TN 37831-6367, pouchardlc@ornl.gov) Luca Cinquini, Gary Strand (National Center for

More information

Product Brief: XenData X2500-SAS LTO-6 Digital Video Archive System

Product Brief: XenData X2500-SAS LTO-6 Digital Video Archive System Product Brief: XenData X2500-SAS LTO-6 Digital Video Archive System Updated: August 5, 2013 Overview The XenData X2500-SAS system includes XenData6 Workstation software which provides the archive, restore

More information

Decrypting your genome data privately in the cloud

Decrypting your genome data privately in the cloud Decrypting your genome data privately in the cloud Marc Sitges Data Manager@Made of Genes @madeofgenes The Human Genome 3.200 M (x2) Base pairs (bp) ~20.000 genes (~30%) (Exons ~1%) The Human Genome Project

More information

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

FLAT: A CLARIN-compatible repository solution based on Fedora Commons FLAT: A CLARIN-compatible repository solution based on Fedora Commons Paul Trilsbeek The Language Archive Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Paul.Trilsbeek@mpi.nl Menzo

More information

VNS3 3.5 Container System Add-Ons

VNS3 3.5 Container System Add-Ons VNS3 3.5 Container System Add-Ons Instructions for VNS3 2015 copyright 2015 1 Table of Contents Introduction 3 Docker Container Network 7 Uploading a Image or Dockerfile 9 Allocating a Container 13 Saving

More information

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Data Replication: Automated move and copy of data PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari c.cacciari@cineca.it Outline The issue

More information

CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume

CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume 1 Shared Health and Research Electronic Library (SHARE) A global electronic repository for developing, integrating

More information

Implementing CDISC Using SAS. Full book available for purchase here.

Implementing CDISC Using SAS. Full book available for purchase here. Implementing CDISC Using SAS. Full book available for purchase here. Contents About the Book... ix About the Authors... xv Chapter 1: Implementation Strategies... 1 The Case for Standards... 1 Which Models

More information

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018 0 Welcome to the Pure International Conference Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018 1 Mendeley Data Use Synergies with Pure to Showcase Additional Research Outputs Nikhil Joshi Solutions

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

RENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team

RENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team RENKU - Reproduce, Reuse, Recycle Research Rok Roškar and the SDSC Renku team Renku-Reana workshop @ CERN 26.06.2018 Goals of Renku 1. Provide the means to create reproducible data science 2. Facilitate

More information

Mini Science DMZ (aka Mini-DMZ)

Mini Science DMZ (aka Mini-DMZ) Mini Science DMZ (aka Mini-DMZ) Steven Wallace ssw@iu.edu Supported by the NSF via a CICI: Secure Data Architecture Award Inspiration During our initial planning process, collecting use cases and user

More information

Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS

Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS Red Hat Atomic Details Dockah, Dockah, Dockah! Containerization as a shift of paradigm for the GNU/Linux OS Daniel Riek Sr. Director Systems Design & Engineering In the beginning there was Stow... and

More information

Long-term digital preservation of UNSWorks

Long-term digital preservation of UNSWorks Long-term digital preservation of UNSWorks UNSW Library Arif Shaon, Maude Frances CAUL Community Days 2014 UNSW Australia The University of New South Wales at a Glance: https://www.unsw.edu.au/sites/default/files/documents/unsw4009_miniguide_2012_aw2_v2.pdf

More information

WS-Resource Framework: Globus Alliance Perspectives

WS-Resource Framework: Globus Alliance Perspectives : Globus Alliance Perspectives Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster Perspectives Why is WSRF important? How does WSRF relate to the Open

More information

Big Data, Big Compute, Big Interac3on Machines for Future Biology. Rick Stevens. Argonne Na3onal Laboratory The University of Chicago

Big Data, Big Compute, Big Interac3on Machines for Future Biology. Rick Stevens. Argonne Na3onal Laboratory The University of Chicago Assembly Annota3on Modeling Design Big Data, Big Compute, Big Interac3on Machines for Future Biology Rick Stevens stevens@anl.gov Argonne Na3onal Laboratory The University of Chicago There are no solved

More information

Data Management at NIST

Data Management at NIST Data Management at NIST John Henry J. Scott 1 Raymond Plante Office of Data and Informatics (ODI) Material Measurement Laboratory National Institute of Standards and Technology Tuesday, March 20, 2018

More information

DRS 2 Glossary. access flag An object access flag records the least restrictive access flag recorded for one of the object s files: ο ο

DRS 2 Glossary. access flag An object access flag records the least restrictive access flag recorded for one of the object s files: ο ο Harvard University Information Technology Library Technology Services DRS 2 Glossary access flag An object access flag records the least restrictive access flag recorded for one of the object s files:

More information

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team Reproducible & Transparent Computational Science with Galaxy Jeremy Goecks The Galaxy Team 1 Doing Good Science Previous talks: performing an analysis setting up and scaling Galaxy adding tools libraries

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

Drone2Map for ArcGIS: Bring Drone Imagery into ArcGIS

Drone2Map for ArcGIS: Bring Drone Imagery into ArcGIS Drone2Map for ArcGIS: Bring Drone Imagery into ArcGIS Mike Sweeney 1 Drone2Map for ArcGIS Turn Drones into Enterprise Productivity Tools ArcGIS Drone2Map for ArcGIS Create 2D and 3D products from raw drone

More information

Service and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838

Service and Cloud Computing Lecture 10: DFS2   Prof. George Baciu PQ838 COMP4442 Service and Cloud Computing Lecture 10: DFS2 www.comp.polyu.edu.hk/~csgeorge/comp4442 Prof. George Baciu PQ838 csgeorge@comp.polyu.edu.hk 1 Preamble 2 Recall the Cloud Stack Model A B Application

More information

Server-Side Workflow Execution using Data Grid Technology for. Reproducible Analyses of Data-Intensive Hydrologic Systems

Server-Side Workflow Execution using Data Grid Technology for. Reproducible Analyses of Data-Intensive Hydrologic Systems Server-Side Workflow Execution using Data Grid Technology for Reproducible Analyses of Data-Intensive Hydrologic Systems Bakinam T. Essawy (orcid.org/0000-0003-2295-7981) Department of Civil and Environmental

More information

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING

Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING Docker A FRAMEWORK FOR DATA INTENSIVE COMPUTING Agenda Intro / Prep Environments Day 1: Docker Deep Dive Day 2: Kubernetes Deep Dive Day 3: Advanced Kubernetes: Concepts, Management, Middleware Day 4:

More information

Pangeo. A community-driven effort for Big Data geoscience

Pangeo. A community-driven effort for Big Data geoscience Pangeo A community-driven effort for Big Data geoscience !2 What Drives Progress in GEOScience? q soil New Ideas 8 < q rain2q ix2q sx z50 liq;z 5 @w : 2K soil @z 1Ksoil z > 0 New Observations New Simulations

More information

An Experimentation Workbench for Replayable Networking Research

An Experimentation Workbench for Replayable Networking Research An Experimentation Workbench for Replayable Networking Research Eric Eide,, Leigh Stoller, and Jay Lepreau University of Utah, School of Computing NSDI 2007 / April 12, 2007 Repeated Research A scientific

More information

Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation (ESGF)

Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation (ESGF) Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation

More information

Data Movement & Tiering with DMF 7

Data Movement & Tiering with DMF 7 Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why

More information

NetCDF and Scientific Data Durability. Russ Rew, UCAR Unidata ESIP Federation Summer Meeting

NetCDF and Scientific Data Durability. Russ Rew, UCAR Unidata ESIP Federation Summer Meeting NetCDF and Scientific Data Durability Russ Rew, UCAR Unidata ESIP Federation Summer Meeting 2009-07-08 For preserving data, is format obsolescence a non-issue? Why do formats (and their access software)

More information

An Experimentation Workbench for Replayable Networking Research

An Experimentation Workbench for Replayable Networking Research An Experimentation Workbench for Replayable Networking Research Eric Eide, Leigh Stoller, and Jay Lepreau Repeated Research A scientific community advances when its experiments are repeated University

More information

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules Singularity overview Vanessa HAMAR Disclaimer } The information in this presentation was compiled from different

More information

ECE/CS 3710 Computer Design Lab Lab 2 Mini-MIPS processor Controller modification, memory mapping, assembly code

ECE/CS 3710 Computer Design Lab Lab 2 Mini-MIPS processor Controller modification, memory mapping, assembly code ECE/CS 3710 Computer Design Lab Lab 2 Mini-MIPS processor Controller modification, memory mapping, assembly code Due Tuesday, September 22nd, 2009 Laboratory Objectives Understand and extend a very very

More information

DataONE: Open Persistent Access to Earth Observational Data

DataONE: Open Persistent Access to Earth Observational Data Open Persistent Access to al Robert J. Sandusky, UIC University of Illinois at Chicago The Net Partners Update: ONE and the Conservancy December 14, 2009 Outline NSF s Net Program ONE Introduction Motivating

More information

Building an Interoperable Materials Infrastructure

Building an Interoperable Materials Infrastructure Building an Interoperable Materials Infrastructure May 2, 2016 Workshop Co-Organizers Jim Warren, Carrie Campbell, Ian Foster, Laura Bartolo CHiMaD Peter Voorhees, Juan de Pablo, Greg Olson CH MaD Summary

More information