irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Similar documents
ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

HPC Storage Use Cases & Future Trends

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Simplifying Collaboration in the Cloud

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

HPC Hardware Overview

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

Big Data 2015: Sponsor and Participants Research Event ""

Georgia State University Cyberinfrastructure Plan

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

Data Movement & Storage Using the Data Capacitor Filesystem

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

Organizational Update: December 2015

HPC Capabilities at Research Intensive Universities

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

IT Infrastructure: Poised for Change

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

GPFS for Life Sciences at NERSC

Storage for HPC, HPDA and Machine Learning (ML)

Writing a Data Management Plan A guide for the perplexed

Parallel File Systems. John White Lawrence Berkeley National Lab

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

I data set della ricerca ed il progetto EUDAT

irods usage at CC-IN2P3: a long history

Implementing a Genomic Data Management System using irods at Bayer HealthCare

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

The iplant Data Commons

Regional & National HPC resources available to UCSB

Table 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti

Cyberinfrastructure!

DSpace Fedora. Eprints Greenstone. Handle System

System Software for Big Data and Post Petascale Computing

Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

Benoit DELAUNAY Benoit DELAUNAY 1

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

Engagement With Scientific Facilities

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

Minimum Laserfiche Rio Hardware Specifications

Users and utilization of CERIT-SC infrastructure

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Windows Azure Services - At Different Levels

Advancing Library Cyberinfrastructure for Big Data Sharing and Reuse. Zhiwu Xie

Petabytes of Preservation on Tape Jason Pierson Oct 2012

Long Term Data Preservation for CDF at INFN-CNAF

The role of PIONIER network in long term preservation services for cultural heritage institutions in Poland

Swedish National Storage Infrastructure for Academic Research with irods

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Jetstream Overview A national research and education cloud

Outline. March 5, 2012 CIRMMT - McGill University 2

Matthias Wobben working in Berlin, Germany. Senior Sales Engineer at Nextcloud

NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Overview of XSEDE for HPC Users Victor Hazlewood XSEDE Deputy Director of Operations

Swedish National Storage Infrastructure for Academic Research with irods

Building Effective CyberGIS: FutureGrid. Marlon Pierce, Geoffrey Fox Indiana University

IEPSAS-Kosice: experiences in running LCG site

VMware, Cisco and EMC The VCE Alliance

HOW TO BUILD A MODERN AI

SQL Azure. Abhay Parekh Microsoft Corporation

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

HPE Storage news. Mauro Colombo Hybrid IT sales & presales manager 23 rd May 2018

WHITEPAPER. MemSQL Enterprise Feature List

Data publication and discovery with Globus

Corente Cloud Services Exchange

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

EMC ISILON HARDWARE PLATFORM

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists

HSC Data Processing Environment ~Upgrading plan of open-use computing facility for HSC data

LHC and LSST Use Cases

Introduction to Grid Computing

Xyratex ClusterStor6000 & OneStor

Academic Workflow for Research Repositories Using irods and Object Storage

<Insert Picture Here> Oracle Storage

Appendix A Recommended Server/Workstation Specifications

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

A Big Big Data Platform

Transcription:

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

What is TACC? Texas Advanced Computing Center Cyberinfrastructure Resources for Open Science University of Texas System 9 Academic, 6 Health campuses NSF and NIH researchers Commercial and non-commercial partners Very diverse community with contrasting needs

Data Intensive Computing Data Analysis and Statistics Data Management and Collections Data management planning and cleanup File and database-oriented collections Pure data dissemination/data sharing Long term storage and project partnerships Web repositories and custom toolkits

The TACC Ecosystem Stampede Top 10/Petaflop-class traditional cluster HPC system Stockyard 20 Petabytes of combined disk storage for all data needs Ranch 160 Petabytes of tape archive storage Maverick/Rustler/Rodeo Niche systems for visualization, Hadoop, VMs, etc

Corral 5 Petabytes replicated DDN storage GPFS basic file system SAS and SATA disk tiers >100 Projects, 100s of users, >4PB of data stored

Corral irods irods 3.3.1, idrop-web, Davis Resources on replicated or unreplicated Corral file systems, Ranch tape archive Primarily used for data sharing, instrument facilities and other special projects

Corral Example: Institute of Classical Archeology Specialized metadata requirements Discipline-specific web interface Highly structured collection Automated metadata extraction on ingest, registration into PHP-based website Two-way metadata sync with website DB

Wrangler TACC Mass Storage Subsystem 10 PB (Replicated) Indiana Mass Storage Subsystem 10 PB (Replicated) IB Interconnect 120 Lanes (56 Gb/s) non-blocking Interconnect with 1 TB/s throughput Access & Analysis System 96 Nodes 128 GB+ Memory Haswell CPUs High Speed Storage System 500+ TB 1 TB/s 250M+ IOPS 40 Gb/s Ethernet 100 Gbps Public Network Globus Access & Analysis System 24 Nodes 128 GB+ Memory Haswell CPUs

Wrangler Service Model Need to dynamically provision a wide range of database, irods, web services Wrangler web portal provides simple user interface for requesting services Includes irods and irods policy/feature selection

Wrangler Dynamic Services Services can be dynamic or persistent Dynamic services run on compute nodes, for days or weeks Collaborative creation/deployment model Can deploy irods as dynamic service Want to test irods database backends on the fastest storage system on the planet? OK

Wrangler Portal

irods in Wrangler Primary Data Management mechanism Will support data publication w/ DOIs, audit trails, checksum/manifest verification, etc Web and WebDAV interfaces Secondary use as platform for experimentation (new rules development, resource hierarchies, website integration)

Use case: Research Instruments Genome sequencing, CT scanners, fmri Telescopes, particle colliders, and bears Central challenge is data management and dissemination to customers irods used for metadata, direct output, access controls, short and long-term storage

HIPAA/FISMA Requirements Documentation, Policy, Documentation Secure replicated storage at rest Secure transmission/networking Effective access controls User education

Securing irods Best practices (mailing list, other big users) Difficult to mix secure and less secure installations (strict ACLs, web sharing, etc) De-identify or isolate secure data Need to look at infrastructure as a whole: Network security is crucial Firewall is required, VPN may be required

Securing irods 2 Really need two-factor authentication Possible with PAM (?) Looking at Toopher plus irods Need improved SSL support irods + SSL + PAM 2-factor + Attractive web interface = Killer app

Chris Jordan ctjordan@tacc.utexas.edu For more information: www.tacc.utexas.edu