Dataverse: Modular Storage and Migration to the Cloud

Size: px
Start display at page:

Download "Dataverse: Modular Storage and Migration to the Cloud"

Transcription

1 Dataverse: Modular Storage and Migration to the Cloud Gustavo Durand, Dataverse Technical Lead / Architect Leonid Andreev, Dataverse Senior Developer

2 Dataverse

3 Overview An open-source platform to publish, cite, and archive research data Built to support multiple types of data, users, and workflows Developed at Harvard s Institute for Quantitative Social Science (IQSS) since 2006 Development funded by IQSS and with grants, in collaboration with institutions around the world 15 on the core team - developers, designers, UI/UX, metadata specialists, curation manager

4 Dataverse Features - Data Persistent IDs / URLs DataCite Handle Automatically Generated Citations with attribution Compliant with FAIR and data citation principles Domain-specific Metadata Versioning File Storage Local Swift (OpenStack) S3 (Amazon)

5 Dataverse Features - Users Multiple Sign In options Native Shibboleth OAuth (ORCID) Dataverses within Dataverses Branding Widgets

6 Dataverse Features - Workflows Permissions Access Controls and Terms of Use Publishing Workflows Private URLs Upload / Download Workflows Browser Dropbox Rsync (for big data packages )

7 Dataverse Features - Interoperability APIs SWORD Native Harvesting (OAI-PMH) Client Server Modular External Tools Explore Configure

8 Dataverse Technology Glassfish Server 4.1 Java SE8 Java EE7 - Presentation: JSF (PrimeFaces), RESTful API - Business: EJB, Transactions, Asynchronous, Timers - Storage: JPA (Entities), Bean Validation Storage: Postgres, Solr, File System / Swift / S3

9 Dataverse Development Process Inbox Backlog This Sprint Development Code Review QA Done

10 (some) Collaborations SBGrid Data Large Data and Support Massachusetts Open Cloud Big Data Storage and Compute Access (OpenStack) DANS/CIMMYT Handles Support ResearchSpace API Java Client Library Provenance W3C PROV

11 Dataverse Community 34 installations around the world

12 Dataverse Community 75+ code contributors outside of the Core Team Hundreds of members of the Dataverse Community developers, researchers, librarians, data scientists Dataverse Google Group Dataverse Community Calls Dataverse Community Meeting Global Dataverse Community Consortium

13 Modularity : External Tools

14 Compute/Explore Access

15 External Tools: Two Ravens and World Map

16 External Tools: Data Explorer

17 External Tools: PSI Budgeteer The budgeteer allows users to select which statistics they would like to calculate and are given estimates of how accurately each statistic can be computed. They can also redistribute their privacy budget according to which statistics they think are most valuable in their dataset.

18 Data Storage How data files are handled in Dataverse

19 (one real life design and development story)

20 Let s talk about common pitfalls when designing complex applications Quick hacks save time; incur costs later Overengineering. (Are you designing too far into the future? Is it an investment into making future development easier or a waste of resources?) The design and development story behind this presentation may be an example of a reasonably balanced mix of expandability and simplicity.

21 Datasets = Metadata + Files!

22 Typical metadata of a Dataverse dataset

23 Data Files in a researcher s dataset

24 How do we store these things? - Early design prototype Metadata: stored in a SQL database Files: stored on the filesystem Implementation: (very much simplified!)

25 But then we thought let s make it modular! StorageIO: An added layer of abstraction between application and file storage Individual drivers for specific types of physical storage

26 Real life use case, early years All the files were stored on a local filesystem; (so did we even need any of that modularity?)

27 Then suddenly cloud happened! Exciting new projects and collaborations, and the need to support new data storage methods. MassOpenCloud - cloud computing collaboration, Swift support needed Harvard Dataverse migration to AWS, S3 support needed SBGrid Databank - a collaboration with Harvard Medical School; Big Data/complex file package model (... and we were prepared)

28 New storage drivers added With the StorageIO framework in place, it was possible to quickly add driver implementations for AWS S3 and OpenStack Swift.

29 Some code examples... Top level StorageIO interface package edu.harvard.iq.dataverse.dataaccess; public abstract class StorageIO { public abstract void open(dataaccessoption... option) throws IOException; public abstract WritableByteChannel getwritechannel() throws IOException; public abstract InputStream getinputstream() throws IOException; public abstract OutputStream getoutputstream() throws IOException; public abstract void delete() throws IOException; public abstract void savepath(path filesystempath) throws IOException; public abstract void saveinputstream(inputstream inputstream) throws IOException;

30 Code sample: FileAccessIO (Filesystem storage driver public void savepath(path filesystempath) throws IOException { Path outputpath = getfilesystempath(); if (outputpath == null) { throw new FileNotFoundException("FileAccessIO: Could not locate physical file for writing."); } Files.copy(fileSystemPath, outputpath, StandardCopyOption.REPLACE_EXISTING); }

31 Code sample: SwiftAccessIO (SWIFT storage driver implementation) It s not that much code, really (and that s the public void savepath(path filesystempath) throws IOException { try { inputfile = filesystempath.tofile(); swiftfileobject.uploadobject(inputfile); } catch (Exception ioex) { throw new IOException("Swift AccessIO: Unknown exception occurred while uploading a local file into a Swift StoredObject"); } }

32 Code sample: S3AccessIO (AWS S3 storage driver public void savepath(path filesystempath) throws IOException { try { File inputfile = filesystempath.tofile(); s3.putobject(new PutObjectRequest(bucketName, key, inputfile)); } catch (SdkClientException ioex) { throw new IOException("S3AccessIO: Unknown exception occured while uploading a local file into S3Object "); } }

33 Files, as seen by Dataverse users (They all look the same to us!)

34 File records in the database, living side by side...

35 Dataverse and extended data storage models in practice.

36 Our users like to upload files...

37 SBGrid Databank A collaboration between SBGrid/Harvard Medical School, Dataverse and Globus (with support from the Helmsley Trust Biomedical Research Infrastructure).

38 SBDB: Big Data Support and Multiple Storage Locations, package files

39 Cloud Dataverse : a collaboration with Massachusetts Open Cloud and Boston University MOC is a public/open research cloud that runs on OpenStack

40 MOC Dataverse: Integration with a Big Data Analytics Platform Big Data Analytics using OpenStack Nova (Compute) and Sahara Sahara: cluster provisioning of data processing frameworks Hadoop/Spark/Pig/Hive/Storm Abstraction for easy job submission Direct Swift I/O integration

41

42 Thank you! Please get in touch with us! Google Group, Github, IRC, Twitter - dataverse.org/contact support@dataverse.org

Securing Dataverse with an Adapted Command Design Pattern. Gustavo Durand, Michael Bar-Sinai, Merce Crosas SecDev - September 26, 2017

Securing Dataverse with an Adapted Command Design Pattern. Gustavo Durand, Michael Bar-Sinai, Merce Crosas SecDev - September 26, 2017 Securing Dataverse with an Adapted Command Design Pattern Gustavo Durand, Michael Bar-Sinai, Merce Crosas SecDev - September 26, 2017 Introduction An application design that enforces permission-based policies

More information

The Open Monolith. Keeping Your Codebase (and Your Headaches) CON3449. Matthew sbgrid.

The Open Monolith. Keeping Your Codebase (and Your Headaches) CON3449. Matthew sbgrid. CON3449 The Open Monolith Keeping Your Codebase (and Your Headaches) Small Michael Bar-Sinai @michbarsinai mbarsinai.com Matthew Dunlap @disbliss sbgrid.org/about/staff/ @dataverseorg Large, monolithic

More information

DATAVERSE FOR JOURNALS

DATAVERSE FOR JOURNALS DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard University @mercecrosas Society for Scholarly Publishing 37 th Meeting, 28, May, 2015 About Dataverse Science requires

More information

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard University @mercecrosas mercecrosas.com Open Research Cloud, May 11, 2017 Best Practices

More information

Demos: DMP Assistant and Dataverse

Demos: DMP Assistant and Dataverse Demos: DMP Assistant and Dataverse Alexandra Cooper, Data Services Coordinator, Queen s University Meghan Goodchild, RDM Systems Librarian, Queen s University/Scholars Portal Overview of session Research

More information

Data publication and discovery with Globus

Data publication and discovery with Globus Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,

More information

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University Update on Dataverse Image credit: David Bygott (CC-BY-NC-SA) 2014 Dryad-Dataverse Community Meeting Mercè Crosas, Elizabeth Quigley & Eleni Castro Data Science > IQSS > Harvard University Introduction

More information

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts

More information

A Data Sharing System

A Data Sharing System Dataverse Network A Data Sharing System Merce Crosas (mcrosas@hmdc.harvard.edu) Director of Product Development Institute of Quantitative Social Science (IQSS) Harvard University A long history of data

More information

Dataverse and DataTags

Dataverse and DataTags NFAIS Open Data Fostering Open Science June 20, 2016 Dataverse and DataTags Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitive Social Science Harvard University @mercecrosas

More information

Science Panel Discussion presentation: "A Data Sharing Story"

Science Panel Discussion presentation: A Data Sharing Story University of Massachusetts Medical School escholarship@umms University of Massachusetts and New England Area Librarian e-science Symposium 2012 e-science Symposium Apr 4th, 10:45 AM - 11:15 AM Science

More information

Metadata Ingestion and Processinng

Metadata Ingestion and Processinng biomedical and healthcare Data Discovery Index Ecosystem Ingestion and Processinng Jeffrey S. Grethe, Ph.D. 2017 BioCADDIE All Hands Meeting prototype Ingestion Indexing Repositories Ingestion ElasticSearch

More information

Helping Journals to Upgrade Data Publications for Reusable Research

Helping Journals to Upgrade Data Publications for Reusable Research Helping Journals to Upgrade Data Publications for Reusable Research Sonia Barbosa (Project Manager) Eleni Castro (Project Coordinator) Ins9tute for Quan9ta9ve Social Science (IQSS) Harvard University @thedataorg

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

BPMN Processes for machine-actionable DMPs

BPMN Processes for machine-actionable DMPs BPMN Processes for machine-actionable DMPs Simon Oblasser & Tomasz Miksa Contents Start DMP... 2 Specify Size and Type... 3 Get Cost and Storage... 4 Storage Configuration and Cost Estimation... 4 Storage

More information

SHARING YOUR RESEARCH DATA VIA

SHARING YOUR RESEARCH DATA VIA SHARING YOUR RESEARCH DATA VIA SCHOLARBANK@NUS MEET OUR TEAM Gerrie Kow Head, Scholarly Communication NUS Libraries gerrie@nus.edu.sg Estella Ye Research Data Management Librarian NUS Libraries estella.ye@nus.edu.sg

More information

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare Stuart Macdonald EDINA & Data Library University of Edinburgh NFAIS Open Data Seminar, 16 June 2016 Context EDINA and Data Library are a

More information

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data Heinrich Widmann, DKRZ Claudia Martens, DKRZ Open Science Days, Berlin, 17 October 2017 www.eudat.eu EUDAT receives funding

More information

Dataverse 4.0 & Beyond. Eleni Castro > Ins/tute for Quan/ta/ve Social Science (IQSS), Harvard University

Dataverse 4.0 & Beyond. Eleni Castro > Ins/tute for Quan/ta/ve Social Science (IQSS), Harvard University Dataverse 4.0 & Beyond ì Eleni Castro > Ins/tute for Quan/ta/ve Social Science (IQSS), Harvard University 2 Data Science Team Data Cura/on & Stewardship Informa/on Scien/sts Researchers Sta/s/cal Innova/on

More information

BlackPearl Customer Created Clients Using Free & Open Source Tools

BlackPearl Customer Created Clients Using Free & Open Source Tools BlackPearl Customer Created Clients Using Free & Open Source Tools December 2017 Contents A B S T R A C T... 3 I N T R O D U C T I O N... 3 B U L D I N G A C U S T O M E R C R E A T E D C L I E N T...

More information

Storage Made Simple: Preserving Digital Objects with bepress Archive and Amazon S3

Storage Made Simple: Preserving Digital Objects with bepress Archive and Amazon S3 University of Massachusetts Medical School escholarship@umms Digital Commons New England User Group 2017 Digital Commons New England User Group Jul 28th, 1:55 PM Storage Made Simple: Preserving Digital

More information

FROM VSTS TO AZURE DEVOPS

FROM VSTS TO AZURE DEVOPS #DOH18 FROM VSTS TO AZURE DEVOPS People. Process. Products. Gaetano Paternò @tanopaterno info@gaetanopaterno.it 2 VSTS #DOH18 3 Azure DevOps Azure Boards (ex Work) Deliver value to your users faster using

More information

Cloud platforms. T Mobile Systems Programming

Cloud platforms. T Mobile Systems Programming Cloud platforms T-110.5130 Mobile Systems Programming Agenda 1. Motivation 2. Different types of cloud platforms 3. Popular cloud services 4. Open-source cloud 5. Cloud on this course 6. Mobile Edge Computing

More information

Astronomy Dataverse: enabling astronomer data publishing.

Astronomy Dataverse: enabling astronomer data publishing. Astronomy Dataverse: enabling astronomer data publishing http://theastrodata.org Harvard-Smithsonian Center for Astrophysics References: Nielsen, M. The Future of Science http://michaelnielsen.org/blog/the-future-of-science-2/

More information

Jenkins: A complete solution. From Continuous Integration to Continuous Delivery For HSBC

Jenkins: A complete solution. From Continuous Integration to Continuous Delivery For HSBC Jenkins: A complete solution From Integration to Delivery For HSBC Rajesh Kumar DevOps Architect @RajeshKumarIN www.rajeshkumar.xyz Agenda Why Jenkins? Introduction and some facts about Jenkins Supported

More information

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality

More information

Click to edit Master title style

Click to edit Master title style Click to edit Master title style Portage Research Management Click to editdata Master subtitle styleplatforms Dataverse North and the Federated Research Data Repository (FRDR) Sept 18, 2017 - Amber Leahey

More information

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region Azure DevOps Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region What is DevOps? People. Process. Products. Build & Test Deploy DevOps is the union of people, process, and products to

More information

DATA SHARING FOR BETTER SCIENCE

DATA SHARING FOR BETTER SCIENCE DATA SHARING FOR BETTER SCIENCE THE DATAVERSE PROJECT Mercè Crosas, Institute for Quantitative Social Science, Harvard University @mercecrosas MAX PLANCK INSTITUTE FOR RADIOASTRONOMY, SEPTEMBER 12, 2017

More information

Red Hat OpenStack Platform 10 Product Guide

Red Hat OpenStack Platform 10 Product Guide Red Hat OpenStack Platform 10 Product Guide Overview of Red Hat OpenStack Platform OpenStack Team Red Hat OpenStack Platform 10 Product Guide Overview of Red Hat OpenStack Platform OpenStack Team rhos-docs@redhat.com

More information

CloudMan cloud clusters for everyone

CloudMan cloud clusters for everyone CloudMan cloud clusters for everyone Enis Afgan usecloudman.org This is accessibility! But only sometimes So, there are alternatives BUT WHAT IF YOU WANT YOUR OWN, QUICKLY The big picture A. Users in different

More information

Managing Data at Scale: Microservices and Events. Randy linkedin.com/in/randyshoup

Managing Data at Scale: Microservices and Events. Randy linkedin.com/in/randyshoup Managing Data at Scale: Microservices and Events Randy Shoup @randyshoup linkedin.com/in/randyshoup Background VP Engineering at Stitch Fix o Combining Art and Science to revolutionize apparel retail Consulting

More information

DOIs for Research Data

DOIs for Research Data DOIs for Research Data Open Science Days 2017, 16.-17. Oktober 2017, Berlin Britta Dreyer, Technische Informationsbibliothek (TIB) http://orcid.org/0000-0002-0687-5460 Scope 1. DataCite Services 2. Data

More information

Transform Your Enterprise Search and ediscovery on the AWS Cloud.

Transform Your Enterprise Search and ediscovery on the AWS Cloud. Transform Your Enterprise Search and ediscovery on the AWS Cloud. Welcome Sheri Sullivan Senior Partner Marketing Manager Amazon Web Services Webinar Overview Submit Your Questions using the Q&A tool.

More information

<Insert Picture Here> Future<JavaEE>

<Insert Picture Here> Future<JavaEE> Future Jerome Dochez, GlassFish Architect The following/preceding is intended to outline our general product direction. It is intended for information purposes only, and may

More information

Building for the Future

Building for the Future Building for the Future The National Digital Newspaper Program Deborah Thomas US Library of Congress DigCCurr 2007 Chapel Hill, NC April 19, 2007 1 What is NDNP? Provide access to historic newspapers Select

More information

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018 0 Welcome to the Pure International Conference Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018 1 Mendeley Data Use Synergies with Pure to Showcase Additional Research Outputs Nikhil Joshi Solutions

More information

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz WS 2015/16 u www.tugraz.at

More information

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS) Institutional Repository using DSpace Yatrik Patel Scientist D (CS) yatrik@inflibnet.ac.in What is Institutional Repository? Institutional repositories [are]... digital collections capturing and preserving

More information

Harvard s Dataverse Network:

Harvard s Dataverse Network: Harvard s Dataverse Network: A JavaServer Faces/EJB 3.0 Technology Data Sharing Solution on Java EE 5 Merce Crosas, Ph.D./Robert Treacy Senior Manager/Architect Harvard University http://thedata.org TS-4656

More information

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources FOSS4G 2017 Boston The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources Devika Kakkar and Ben Lewis Harvard Center for Geographic Analysis

More information

Developing Applications with Java EE 6 on WebLogic Server 12c

Developing Applications with Java EE 6 on WebLogic Server 12c Developing Applications with Java EE 6 on WebLogic Server 12c Duration: 5 Days What you will learn The Developing Applications with Java EE 6 on WebLogic Server 12c course teaches you the skills you need

More information

Cloud Computing. An introduction using MS Office 365, Google, Amazon, & Dropbox.

Cloud Computing. An introduction using MS Office 365, Google, Amazon, & Dropbox. Cloud Computing An introduction using MS Office 365, Google, Amazon, & Dropbox. THIS COURSE Will introduce the benefits and limitations of adopting cloud computing for your business. Will introduce and

More information

This tutorial is meant for software developers who want to learn how to lose less time on API integrations!

This tutorial is meant for software developers who want to learn how to lose less time on API integrations! CloudRail About the Tutorial CloudRail is an API integration solution that speeds up the process of integrating third-party APIs into an application and maintaining them. It does so by providing libraries

More information

Job Description: Junior Front End Developer

Job Description: Junior Front End Developer Job Description: Junior Front End Developer As a front end web developer, you would be responsible for managing the interchange of data between the server and the users, as well as working with our design

More information

Building a Data Catalog

Building a Data Catalog Building a Data Catalog Promoting Data Reuse and Collaboration at an Academic Medical Center Kevin Read, MLIS, MAS Alisa Surkis, PhD, MLIS EXTERNAL DATASETS 2 EXTERNAL DATASETS INTERNAL DATASETS 3 NYU

More information

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft

RADAR. Establishing a generic Research Data Repository: RESEARCH DATA REPOSITORY. Dr. Angelina Kraft RESEARCH DATA REPOSITORY http://www.radar-projekt.org http://www.radar-service.eu Establishing a generic Research Data Repository: RADAR Digital Infrastructures for Research 2016 Conference 28 th - 30

More information

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)?

More information

Creating engaging website experiences on any device (e.g. desktop, tablet, smartphone) using mobile responsive design.

Creating engaging website experiences on any device (e.g. desktop, tablet, smartphone) using mobile responsive design. Evoq Content: A CMS built for marketers to deliver modern web experiences Content is central to your ability to find, attract and convert customers. According to Forrester Research, buyers spend two-thirds

More information

EUDAT Towards a Collaborative Data Infrastructure

EUDAT Towards a Collaborative Data Infrastructure EUDAT Towards a Collaborative Data Infrastructure Daan Broeder - MPI for Psycholinguistics - EUDAT - CLARIN - DASISH Bielefeld 10 th International Conference Data These days it is so very easy to create

More information

From Java EE to Jakarta EE. A user experience

From Java EE to Jakarta EE. A user experience From Java EE to Jakarta EE A user experience A few words about me blog.worldline.tech @jefrajames Speaker me = SpeakerOf.setLastName( James ).setfirstname( Jean-François ).setbackgroundinyears(32).setmindset(

More information

Prediction of workflow execution time using provenance traces: practical applications in medical data processing

Prediction of workflow execution time using provenance traces: practical applications in medical data processing Prediction of workflow execution time using provenance traces: practical applications in medical data processing Hugo Hiden Simon Woodman Paul Watson How long will my program take to run? Part of a bigger

More information

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI SEAD Data Services Jim Myers(myersjd@umich.edu), Best Practices in Data Infrastructure Workshop Cooperative agreement #OCI0940824 SEAD: Sustainable Environment - Actionable Data An NSF DataNet project

More information

Building a Digital Library Software

Building a Digital Library Software Building a Software INVENIO, Part 1 J-Y. Le Meur Department of Information Technology CERN JINR-CERN School on GRID and Information Management Systems 14 May 2012 Outline 1 2 3 4 Outline 1 2 3 4 A physicist

More information

Cloud platforms T Mobile Systems Programming

Cloud platforms T Mobile Systems Programming Cloud platforms T-110.5130 Mobile Systems Programming Agenda 1. Motivation 2. Different types of cloud platforms 3. Popular cloud services 4. Open-source cloud 5. Cloud on this course 6. Some useful tools

More information

Science-as-a-Service

Science-as-a-Service Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services

More information

Mega-scale Postgres How to run 1,000,000 Postgres Databases

Mega-scale Postgres How to run 1,000,000 Postgres Databases Mega-scale Postgres How to run 1,000,000 Postgres Databases Program What is Heroku & Heroku Postgres? Organizing principles for mega-scale operations Heroku Postgres Code deployment is good, but what

More information

Roles. Ecosystem Flow of Information between Roles Accountability

Roles. Ecosystem Flow of Information between Roles Accountability Roles Ecosystem Flow of Information between Roles Accountability Role Definitions Role Silo Job Tasks Compute Admin The Compute Admin is responsible for setting up and maintaining the physical and virtual

More information

B2SAFE metadata management

B2SAFE metadata management B2SAFE metadata management version 1.2 by Claudio Cacciari, Robert Verkerk, Adil Hasan, Elena Erastova Introduction The B2SAFE service provides a set of functions for long term bit stream data preservation:

More information

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Paving the Rocky Road Toward Open and FAIR in the Field Sciences Paving the Rocky Road Toward Open and FAIR Kerstin Lehnert Lamont-Doherty Earth Observatory, Columbia University IEDA (Interdisciplinary Earth Data Alliance), www.iedadata.org IGSN e.v., www.igsn.org Field

More information

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

FLAT: A CLARIN-compatible repository solution based on Fedora Commons FLAT: A CLARIN-compatible repository solution based on Fedora Commons Paul Trilsbeek The Language Archive Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Paul.Trilsbeek@mpi.nl Menzo

More information

DataSTORRE Deposit Guide

DataSTORRE Deposit Guide DataSTORRE Deposit Guide Introduction DataStorre is an online digital repository of multi-disciplinary research datasets produced at the University of Stirling. University of Stirling researchers who have

More information

June 2, 2015, 10:30am EDT Food and Drug Administration, New Hampshire Ave., WO 66, Silver Spring, MD Pre submission Document ID: Q150777

June 2, 2015, 10:30am EDT Food and Drug Administration, New Hampshire Ave., WO 66, Silver Spring, MD Pre submission Document ID: Q150777 Tidepool FDA Pre-submission Meeting Minutes June 2, 2015, 10:30am EDT Food and Drug Administration, 10903 New Hampshire Ave., WO 66, Silver Spring, MD 20993 Pre submission Document ID: Attached: Presentation

More information

CLOUD MANAGEMENT AND SECURITY

CLOUD MANAGEMENT AND SECURITY CLOUD MANAGEMENT AND SECURITY Imad M. Abbadi University of Oxford, UK Wiley Contents About the Author Preface Acknowledgments Acronyms xi xiii xix xxi 1 Introduction 1 1.1 Overview 1 1.2 Cloud Definition

More information

Dataverse Usability Evaluation: Findings & Recommendations. Presented by Eric Gibbs Lin Lin Elizabeth Quigley

Dataverse Usability Evaluation: Findings & Recommendations. Presented by Eric Gibbs Lin Lin Elizabeth Quigley Dataverse Usability Evaluation: Findings & Recommendations Presented by Eric Gibbs Lin Lin Elizabeth Quigley Agenda Introduction Scenarios Findings & Recommendations Next Steps Introduction Scenarios Scenario

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources Brian Matthews Data Science and Technology Group Scientific Computing Department STFC Persistent Identifiers Long-lasting

More information

Building A Billion Spatio-Temporal Object Search and Visualization Platform

Building A Billion Spatio-Temporal Object Search and Visualization Platform 2017 2 nd International Symposium on Spatiotemporal Computing Harvard University Building A Billion Spatio-Temporal Object Search and Visualization Platform Devika Kakkar, Benjamin Lewis Goal Develop a

More information

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1

Persistent Identifier the data publishing perspective. Sünje Dallmeier-Tiessen, CERN 1 Persistent Identifier the data publishing perspective Sünje Dallmeier-Tiessen, CERN 1 Agenda Data Publishing Specific Data Publishing Needs THOR Latest Examples/Solutions Publishing Centerpiece of research

More information

Archive II. The archive. 26/May/15

Archive II. The archive. 26/May/15 Archive II The archive 26/May/15 What is an archive? Is a service that provides long-term storage and access of data. Long-term usually means ~5years or more. Archive is strictly not the same as a backup.

More information

Licensing Guide for Partners

Licensing Guide for Partners Microsoft PowerApps & Microsoft Flow Licensing Guide for Partners November 2016 The Microsoft PowerApps & Flow Licensing Guide November 2016 Contents Introduction to Microsoft PowerApps & Microsoft Flow...

More information

DataBridge: CREATING BRIDGES TO FIND DARK DATA. Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES. The Team

DataBridge: CREATING BRIDGES TO FIND DARK DATA. Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES. The Team Vol. 3, No. 5 July 2015 RENCI WHITE PAPER SERIES DataBridge: CREATING BRIDGES TO FIND DARK DATA The Team HOWARD LANDER Senior Research Software Developer (RENCI) ARCOT RAJASEKAR, PhD Chief Domain Scientist,

More information

Cloud Computing: Making the Right Choice for Your Organization

Cloud Computing: Making the Right Choice for Your Organization Cloud Computing: Making the Right Choice for Your Organization A decade ago, cloud computing was on the leading edge. Now, 95 percent of businesses use cloud technology, and Gartner says that by 2020,

More information

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as NorStore a national infrastructure for scientific data Andreas O Jaunsen UNINETT Sigma as About UNINETT Sigma UNINETT Sigma AS is a private company established by the Ministry of science and education

More information

sqamethods Approach to Building Testing Automation Systems

sqamethods Approach to Building Testing Automation Systems sqamethods Approach to Building Testing Automation Systems By Leopoldo A. Gonzalez leopoldo@sqamethods.com BUILDING A TESTING AUTOMATION SYSTEM...3 OVERVIEW...3 GOALS FOR AN AUTOMATION SYSTEM...3 BEGIN

More information

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond Alessia Bardi and Paolo Manghi, Institute of Information Science and Technologies CNR Katerina Iatropoulou, ATHENA, Iryna Kuchma and Gwen Franck, EIFL Pedro Príncipe, University of Minho OpenAIRE Fostering

More information

CO Java EE 7: Back-End Server Application Development

CO Java EE 7: Back-End Server Application Development CO-85116 Java EE 7: Back-End Server Application Development Summary Duration 5 Days Audience Application Developers, Developers, J2EE Developers, Java Developers and System Integrators Level Professional

More information

Get the Most Out of GoAnywhere: Achieving Cloud File Transfers and Integrations

Get the Most Out of GoAnywhere: Achieving Cloud File Transfers and Integrations Get the Most Out of GoAnywhere: Achieving Cloud File Transfers and Integrations Today s Presenter Dan Freeman, CISSP Senior Solutions Consultant HelpSystems Steve Luebbe Director of Development HelpSystems

More information

8.0 Help for End Users About Jive for SharePoint System Requirements Using Jive for SharePoint... 6

8.0 Help for End Users About Jive for SharePoint System Requirements Using Jive for SharePoint... 6 for SharePoint 2010/2013 Contents 2 Contents 8.0 Help for End Users... 3 About Jive for SharePoint... 4 System Requirements... 5 Using Jive for SharePoint... 6 Overview of Jive for SharePoint... 6 Accessing

More information

Cloud Foundry and OpenStack

Cloud Foundry and OpenStack Free Signup: www.cloudfoundry.com, code: openstack2013 Cloud Foundry and OpenStack Ferran Rodenas, Dekel Tankel Cloud Foundry, Pivotal frodenas@vmware.com, twitter: @ferdy dekel@vmware.com, twitter: @dekt

More information

Unpacking Office 365 A high level overview of the apps and services bundled in the standard Office 365 subscription: What is it Use cases FAQ

Unpacking Office 365 A high level overview of the apps and services bundled in the standard Office 365 subscription: What is it Use cases FAQ Unpacking Office 365 A high level overview of the apps and services bundled in the standard Office 365 subscription: What is it Use cases Unpacking Office 365 Making the move to Office 365? Whether your

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

The future of database technology is in the clouds

The future of database technology is in the clouds Database.com Getting Started Series White Paper The future of database technology is in the clouds WHITE PAPER 0 Contents OVERVIEW... 1 CLOUD COMPUTING ARRIVES... 1 THE FUTURE OF ON-PREMISES DATABASE SYSTEMS:

More information

AWS Lambda: Event-driven Code in the Cloud

AWS Lambda: Event-driven Code in the Cloud AWS Lambda: Event-driven Code in the Cloud Dean Bryen, Solutions Architect AWS Andrew Wheat, Senior Software Engineer - BBC April 15, 2015 London, UK 2015, Amazon Web Services, Inc. or its affiliates.

More information

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments * Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments * Joesph JaJa joseph@ Mike Smorul toaster@ Fritz McCall fmccall@ Yang Wang wpwy@ Institute

More information

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Storage Resource Broker Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb Background NARA research prototype persistent

More information

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer & Provisioning with SUSE Enterprise Storage Nyers Gábor Trainer & Consultant @Trebut gnyers@trebut.com Managing storage growth and costs of the software-defined datacenter PRESENT Easily scale and manage

More information

Cloud FastPath: Highly Secure Data Transfer

Cloud FastPath: Highly Secure Data Transfer Cloud FastPath: Highly Secure Data Transfer Tervela helps companies move large volumes of sensitive data safely and securely over network distances great and small. Tervela has been creating high performance

More information

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow Data Management Plans Sarah Jones Digital Curation Centre, Glasgow sarah.jones@glasgow.ac.uk Twitter: @sjdcc Data Management Plan (DMP) workshop, e-infrastructures Austria, Vienna, 17 November 2016 What

More information

OPENSTACK PRIVATE CLOUD WITH GITHUB

OPENSTACK PRIVATE CLOUD WITH GITHUB OPENSTACK PRIVATE CLOUD WITH GITHUB Kiran Gurbani 1 Abstract Today, with rapid growth of the cloud computing technology, enterprises and organizations need to build their private cloud for their own specific

More information

EUDAT - Open Data Services for Research

EUDAT - Open Data Services for Research EUDAT - Open Data Services for Research Johannes Reetz EUDAT operations Max Planck Computing & Data Centre Science Operations Workshop 2015 ESO, Garching 24-27th November 2015 EUDAT receives funding from

More information

Research at PNNL: Powered by AWS NLIT 2018

Research at PNNL: Powered by AWS NLIT 2018 Research at PNNL: Powered by AWS NLIT 2018 RALPH PERKO AND MIKE GIARDINELLI Pacific Northwest National Laboratory Reference herein to any specific commercial product, process, or service by trade name,

More information

ganeti Comparing IaaS VMware vs OpenStack vs Google s Ganeti November 2013 Giuseppe Gippa Paternò

ganeti Comparing IaaS VMware vs OpenStack vs Google s Ganeti November 2013 Giuseppe Gippa Paternò ganeti Comparing IaaS VMware vs OpenStack vs Google s Ganeti November 2013 Giuseppe Gippa Paternò Knowing Gippa... EMEA Cloud Solution Architect for Canonical (the company behind Ubuntu). In this role

More information

Horizon Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts. Deliverable D4.1

Horizon Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts. Deliverable D4.1 Horizon 2020 Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts Deliverable D4.1 Data management plan (open research data pilot) Date of preparation: 2015/09/30 Start date

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni (g.fiameni@cineca.it) Claudio Cacciari SuperComputing, Application and Innovation CINECA Johannes Reatz RZG, Germany Damien

More information

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Enterprise Java Unit 1-Chapter 2 Prof. Sujata Rizal Java EE 6 Architecture, Server and Containers

Enterprise Java Unit 1-Chapter 2 Prof. Sujata Rizal Java EE 6 Architecture, Server and Containers 1. Introduction Applications are developed to support their business operations. They take data as input; process the data based on business rules and provides data or information as output. Based on this,

More information

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017 Services to Make Sense of Data Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017 How many journals make data sharing a requirement of publication? https://jordproject.wordpress.com/2013/07/05/going-back-to-basics-reusing-data/

More information