The Future of Galaxy. Nate Coraor galaxyproject.org

Size: px
Start display at page:

Download "The Future of Galaxy. Nate Coraor galaxyproject.org"

Transcription

1 The Future of Galaxy Nate Coraor galaxyproject.org

2 Galaxy is... A framework for scientists Enables usage of complicated command line tools Deals with file formats as transparently as possible Provides a rich visualization and visual analytics system

3 Galaxy is... getgalaxy.org Free, open source software Bring your own compute, storage, tools Maximize privacy and security usegalaxy.org/cloud Galaxy cluster in Amazon EC2 Buy as much compute, storage as you need usegalaxy.org Free, public Galaxy server 3.5 TB of reference data 0.8 PB of user data 4,000+ jobs/day

4

5 New Users per Month Jan 2010 Jan 2011 Jan 2012 Jan 2013 Wednesday, July 17, 13

6 usegalaxy.org data growth +128 cores for NGS/multicore jobs Data quotas implemented...

7 Total Jobs Completed (count) Jobs Deleted Before Run (% of usegalaxy.org frustration growth 160,000 10% 140,000 9% 120, ,000 80,000 60,000 40,000 20, % 7% 6% 5% 4% 3% 2% 1% 0%

8 Where we are

9 Where we are going

10 Where we are going

11 Where we are going Continuing work with ECSS to submit jobs to disparate XSEDE resources Globus Online endpoint for usegalaxy.org Allow users to utilize their XSEDE allocations directly through usegalaxy.org Display detailed information about queue position and resource utilization

12 Massive Scale Analysis Improve Galaxy workflow engine and UI We can run workflows on single datasets now What about hundreds or thousands?

13 Scaling Efforts So many tools and workflows, not enough manpower Focus on building infrastructure to allow community to integrate and share tools, workflows, and best practices Too much data, not enough infrastructure Support greater access to usegalaxy.org public and user data from local and cloud Galaxy instances

14

15 Data Exchange A big data store for encouraging data exchange among Galaxy instances Galaxy data mirrored in PSC SLASH2- backed Data Supercell Federation

16

17 Establishing an XSEDE Galaxy Gateway XSEDE ECSS Symposium, December Philip Blood Senior Computational Scientist Pittsburgh Supercomputing Center

18 Galaxy Team: PSC Team: Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 18

19 643 HiSeqs = 6.5 Pb/year 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 19

20 Using Galaxy to Handle Big Data? Compartmentalized solutions: Private Galaxy installations on Campuses Galaxy installations on XSEDE (e.g. NCGAS) Galaxy installations at other CI/cloud providers (e.g. Globus Genomics) Galaxy on public clouds (e.g. Amazon) 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 20

21 The Vision: A United Federation of Galaxies Ultimately, we envision that any Galaxy instance (in any lab, not just Galaxy Main) will be able to spawn jobs, access data, and share data on external infrastructure whether this is an XSEDE resource, a cluster of Amazon EC2 machines, a remote storage array, etc Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 21

22 A Step Forward: Make Galaxy Main an XSEDE Galaxy Gateway Certain Galaxy Main workflows or tasks will be executed on XSEDE resources Especially, tasks that require HPC, e.g. de-novo assembly applications Velvet (of genome) and Trinity (of transcriptome) to PSC Blacklight (up to 16 TB of coherent shared memory per process) Should be transparent to the user of usegalaxy.org 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 22

23 Key Problems to Solve Data Migration: Galaxy currently relies on a shared filesystem between the instance host and the execution server to store the reference and user data required by the workflow. This is implemented via NFS. Remote Job Submission: Galaxy job execution currently requires a direct interface with the resource manager on the execution server Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 23

24 What We ve Done So Far* Addressing Data Migration Issues Established 10 GigE link between PSC and Penn State Established a common wide area distributed filesystem between PSC and Penn State using SLASH2 ( Addressing Remote Job Submission Created a new Galaxy job-running plugin for SSH job submission Incorporated Velvet and Trinity into Galaxy s XML interface Successfully submitted test jobs from Penn State and executed on Blacklight using the data replicated via SLASH2 from Penn State to PSC. *Some of these points will be revisited, since Galaxy is now hosted at TACC 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 24

25 Galaxy Remote Data Architecture Access is identical from Galaxy Main and PSC to the shared dataset via /galaxys2 SLASH2 file system handles consistency and multiple residency coherency and presence Data Generation and Processing Nodes Local copies are maintained for performance Jobs run on PSC compute resources such as Blacklight, as well as Galaxy Main PSC Data Generation and Processing Nodes /galaxys2 SLASH2 Wide-Area Common File system GalaxyFS Galaxy Main /galaxys Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 25

26 Galaxy Main Gateway: What Remains to Be Done (1) Integrate this work with the production public Galaxy site, usegalaxy.org (now hosted at TACC) Dynamic job submission: allowing the selection of appropriate remote or local resources (cores, memory, walltime, etc.) based on individual job requirements (possibly using an Open Grid Services Architecture Basic Execution Service compatible service, such as Unicore) 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 26

27 What Remains to Be Done (2) Galaxy-controlled data management: to intelligently and efficiently migrate and use data on distributed compute resources Testing various data migration strategies with SLASH2 and other available technologies Further developing SLASH2 to meet Federated Galaxy requirements through recent NSF DIBBs award at PSC Authentication with Galaxy instances: using XSEDE or other credentials, e.g., InCommon/CILogon (see upcoming talk by Indiana) Additional data transfer capabilities in Galaxy: such as IRODS and Globus Online (see upcoming talk on Globus Genomics) 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 27

28 Eventually: Use These Technologies to Enable Universal Federation 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 28

29 Appendix Initial Galaxy Data Staging to PSC Underlying SLASH2 Architecture 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 29

30 Initial Galaxy Data Staging to PSC Transferred 470TB in 21 days from PSU to PSC (average ~22TB/day; peak 40 TB/day) rsync used to initially stage and synchronize subsequent updates Data copy maintained in PSC in /arc file system available from compute nodes Data Generation Nodes Storage Penn State PSC 10gigE link Data SuperCell 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 30

31 Underlying SLASH2 Architecture Metadata Server (MDS) One at Galaxy Main and one at PSC for performance Converts pathnames to object IDs Schedules updates when copies become inconsistent Consistency protocol to avoid incoherent data Residency and network scheduling policies enforced Clients All other file ops (RENAME, SYMLINK, etc.) I/O servers are very lightweight Can use most backing file systems (ZFS, ext4fs, etc.) READ and WRITE I/O I/O I/O I/O Servers (IOS) (IOS) (IOS) (IOS) Clients are compute resources & dedicated front ends Dataset residency requests issued from administrators and/or users 2013 Pittsburgh Supercomputing Center 2010 Pittsburgh Supercomputing Center 31

32 Funded by National Science Foundation 1. Large memory clusters for assembly 2. Bioinformatics consulting for biologists 3. Optimized software for better efficiency Collaboration across IU, TACC, SDSC, and PSC. Open for business at:

33 Making it easier for Biologists Computational Skills Common LOW Web interface to NCGAS resources Supports many bioinformatics tools Rare HIGH Available for both research and instruction.

34 GALAXY.NCGAS.ORG Model Individual projects can get duplicate boxes provided they support it themselves. Virtual box hosting Galaxy.ncgas.org The host for each tool is configured individually NCGAS establishes tools, hardens them, and moves them into production. Quarry Mason Archive Data Capacitor

35 Moving Forward Your Friendly Neighborhood Sequencing Center 100 Gbps NCGAS Mason (Free for NSF users) Your Friendly Neighborhood Sequencing Center Globus On-line and other tools Data Capacitor NO data storage Charges Lustre WAN File System Other NCGAS XSEDE Resources IU POD (12 cents per core hour) Your Friendly Neighborhood Sequencing Center 10 Gbps Optimized Software

36 Core Hours 4500 NCGAS Galaxy Usage: Jan 1-Feb 1-Mar 1-Apr 1-May 1-Jun 1-Jul 1-Aug 1-Sep 1-Oct 1-Nov Month

37 CILogon Authentication for Galaxy Dec. 17, 2013

38 Goals and Approaches NCGAS Authentication Requirements: XSEDE users can authenticate with NCGAS Galaxy through InCommon credentials. Only NCGAS authorized users can authenticate and use the resource. CILogon Service ( allows users to authenticate with their home organization and obtain a certificate for secure access to CyberInfrastructure. It supports MyProxy OAuth protocol for certificate delegation to enable science gateways to access CI on user s behalf. Incorporate CILogon as external user authentication for Galaxy, with home-brewed simple authorization mechanism.

39 Technical Challenges CILogon OAuth client implementation is Java while Galaxy is Python; Python lacks full featured OAuth libraries supporting RSA-SHA1 signature method required by CILogon's OAuth interface. Once authenticated through CILogon, remote username needs to be forwarded to Galaxy via Apache proxy; Additional authorization required for CILogon authenticated users; Some of the default CILogon IdPs including OpenID providers (Google, Paypal, Verisign) are not desired.

40 Architecture Authentication Apache Web Server HTTP_COOKIE PHP CILogon OAuth Client

41 Technical Highlights PHP (non Java) implementation of CILogon OAuth Client. Configure Apache proxy to Galaxy: Enable Galaxy external user authentication (universe_wsgi.ini); Configure Apache for proxy forwarding; (httpd ssl.conf); Configure Apache for CILogon authentication with HTTP_COOKIE rewrite; (httpd ssl.conf) Customized NCGAS Skin limiting IdP to InCommon academic. PHP implementation of simple file-based user authorization. Lightweight, packaged for general Galaxy installation. Open source and more details at:

42 Demo

43 Experiences in building a nextgeneration sequencing analysis service using Galaxy, Globus, and Amazon Web Services Ravi K Madduri Argonne National Lab and University of Chicago

44 Globus Genomics Architecture

45 Globus Genomics Solution Description Integrated Identity management, Group management and Data movement using Globus Computational profiles for various analysis tools Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure Glusterfs as a shared file system between head nodes and compute nodes Provisioned I/O on EBS

46 Globus Genomics Usage

47 Example User Cox Lab omputation Institute, University of Chicago, Chicago, IL, USA. 2 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, U 3 Section Genetic Medicine, University of Chicago, Chicago, IL. Challenges in Next-Gen Sequencing Analysis Parallel Workflows on Globus Genomics High Performance, Reusable Consensus

48 Globus Genomics Pricing

49 Acknowledgments This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02-06CH We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments. The Globus Genomics and Globus Online teams at University of Chicago and Argonne National Laboratory

50 For more information More information on Globus Genomics and to sign up: More information on Globus Online: Questions? Thank you!

The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists

The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused Technical Workshop. Berkeley, CA July 17-18, 2013

More information

Federated Services for Scientists Thursday, December 9, p.m. EST

Federated Services for Scientists Thursday, December 9, p.m. EST IAM Online Federated Services for Scientists Thursday, December 9, 2010 1 p.m. EST Rachana Ananthakrishnan Argonne National Laboratory & University of Chicago Jim Basney National Center for Supercomputing

More information

Galaxy. Data intensive biology for everyone. / #usegalaxy

Galaxy. Data intensive biology for everyone. / #usegalaxy Galaxy Data intensive biology for everyone. www.galaxyproject.org @jxtx / #usegalaxy High-Throughput v I SEQUENCING! High-throughput sequencing is transformative Resequencing De novo genome sequencing

More information

Data publication and discovery with Globus

Data publication and discovery with Globus Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,

More information

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016 The Data exacell DXC J. Ray Scott DXC PI May 17, 2016 DXC Leadership Mike Levine Co-Scientific Director Co-PI Nick Nystrom Senior Director of Research Co-PI Ralph Roskies Co-Scientific Director Co-PI Robin

More information

globus online Globus Nexus Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory

globus online Globus Nexus Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory globus online Globus Nexus Steve Tuecke Computation Institute University of Chicago and Argonne National Laboratory Computation Institute (CI) Apply to challenging problems Accelerate by building the research

More information

The Data Exacell (DXC): Data Infrastructure Building Blocks for Integrating Analytics with Data Management

The Data Exacell (DXC): Data Infrastructure Building Blocks for Integrating Analytics with Data Management The Data Exacell (DXC): Data Infrastructure Building Blocks for Integrating Analytics with Data Management Nick Nystrom, Michael J. Levine, Ralph Roskies, and J Ray Scott Pittsburgh Supercomputing Center

More information

Science-as-a-Service

Science-as-a-Service Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services

More information

Leveraging the InCommon Federation to access the NSF TeraGrid

Leveraging the InCommon Federation to access the NSF TeraGrid Leveraging the InCommon Federation to access the NSF TeraGrid Jim Basney Senior Research Scientist National Center for Supercomputing Applications University of Illinois at Urbana-Champaign jbasney@ncsa.uiuc.edu

More information

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development Jeremy Fischer Indiana University 9 September 2014 Citation: Fischer, J.L. 2014. ACCI Recommendations on Long Term

More information

Galaxy. Data intensive biology for everyone. / #usegalaxy

Galaxy. Data intensive biology for everyone. / #usegalaxy Galaxy Data intensive biology for everyone. www.galaxyproject.org @jxtx / #usegalaxy Engineering Dannon Baker Dan Blankenberg Dave Bouvier Nate Coraor Carl Eberhard Jeremy Goecks Sam Guerler Greg von Kuster

More information

Overview of HPC at LONI

Overview of HPC at LONI Overview of HPC at LONI Le Yan HPC Consultant User Services @ LONI What Is HPC High performance computing is to use supercomputers to solve problems computationally The most powerful supercomputer today

More information

Getting Started with XSEDE. Dan Stanzione

Getting Started with XSEDE. Dan Stanzione November 3, 2011 Getting Started with XSEDE Dan Stanzione Welcome to XSEDE! XSEDE is an exciting cyberinfrastructure, providing large scale computing, data, and visualization resources. XSEDE is the evolution

More information

Climate Data Management using Globus

Climate Data Management using Globus Climate Data Management using Globus Computation Institute Rachana Ananthakrishnan (ranantha@uchicago.edu) Data Management Challenges Transfers often take longer than expected based on available network

More information

Regional & National HPC resources available to UCSB

Regional & National HPC resources available to UCSB Regional & National HPC resources available to UCSB Triton Affiliates and Partners Program (TAPP) Extreme Science and Engineering Discovery Environment (XSEDE) UCSB clusters https://it.ucsb.edu/services/supercomputing

More information

HPC Capabilities at Research Intensive Universities

HPC Capabilities at Research Intensive Universities HPC Capabilities at Research Intensive Universities Purushotham (Puri) V. Bangalore Department of Computer and Information Sciences and UAB IT Research Computing UAB HPC Resources 24 nodes (192 cores)

More information

XSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences

XSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences January 3, 2016 XSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences jwf@utk.edu What is XSEDE? extreme Science and Engineering Discovery Environment $121M project

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science T. Maeno, K. De, A. Klimentov, P. Nilsson, D. Oleynik, S. Panitkin, A. Petrosyan, J. Schovancova, A. Vaniachine,

More information

The SciTokens Authorization Model: JSON Web Tokens & OAuth

The SciTokens Authorization Model: JSON Web Tokens & OAuth The SciTokens Authorization Model: JSON Web Tokens & OAuth Jim Basney Brian Bockelman This material is based upon work supported by the National Science

More information

Goal. TeraGrid. Challenges. Federated Login to TeraGrid

Goal. TeraGrid. Challenges. Federated Login to TeraGrid Goal Federated Login to Jim Basney Terry Fleury Von Welch Enable researchers to use the authentication method of their home organization for access to Researchers don t need to use -specific credentials

More information

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services WVU RESEARCH COMPUTING INTRODUCTION Introduction to WVU s Research Computing Services WHO ARE WE? Division of Information Technology Services Funded through WVU Research Corporation Provide centralized

More information

XSEDE New User Training. Ritu Arora November 14, 2014

XSEDE New User Training. Ritu Arora   November 14, 2014 XSEDE New User Training Ritu Arora Email: rauta@tacc.utexas.edu November 14, 2014 1 Objectives Provide a brief overview of XSEDE Computational, Visualization and Storage Resources Extended Collaborative

More information

Introduction to SciTokens

Introduction to SciTokens Introduction to SciTokens Brian Bockelman, On Behalf of the SciTokens Team https://scitokens.org This material is based upon work supported by the National Science Foundation under Grant No. 1738962. Any

More information

Building Bridges: A System for New HPC Communities

Building Bridges: A System for New HPC Communities Building Bridges: A System for New HPC Communities HPC User Forum 59 LRZ, Garching October 16, 2015 Presenter: Jim Kasdorf Director, Special Projects Pittsburgh Supercomputing Center kasdorf@psc.edu 2015

More information

The State of the Raven. Jon Warbrick University of Cambridge Computing Service

The State of the Raven. Jon Warbrick University of Cambridge Computing Service The State of the Raven Jon Warbrick University of Cambridge Computing Service jw35@cam.ac.uk Corvus corax Raven photo used under the terms of the GNU Free Documentation License. Author Pcb21. Raven Web

More information

Building the Modern Research Data Portal using the Globus Platform. Rachana Ananthakrishnan GlobusWorld 2017

Building the Modern Research Data Portal using the Globus Platform. Rachana Ananthakrishnan GlobusWorld 2017 Building the Modern Research Data Portal using the Globus Platform Rachana Ananthakrishnan rachana@globus.org GlobusWorld 2017 Platform Questions How do you leverage Globus services in your own applications?

More information

CILogon Project

CILogon Project CILogon Project GlobusWORLD 2010 Jim Basney jbasney@illinois.edu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign This material is based upon work supported by

More information

Data Movement & Storage Using the Data Capacitor Filesystem

Data Movement & Storage Using the Data Capacitor Filesystem Data Movement & Storage Using the Data Capacitor Filesystem Justin Miller jupmille@indiana.edu http://pti.iu.edu/dc Big Data for Science Workshop July 2010 Challenges for DISC Keynote by Alex Szalay identified

More information

Leveraging Globus Identity for the Grid. Suchandra Thapa GlobusWorld, April 22, 2016 Chicago

Leveraging Globus Identity for the Grid. Suchandra Thapa GlobusWorld, April 22, 2016 Chicago Leveraging Globus Identity for the Grid Suchandra Thapa GlobusWorld, April 22, 2016 Chicago Open Science Grid Helps researchers speed up their research using high throughput computing methods Helps campus

More information

Lecture 1: January 22

Lecture 1: January 22 CMPSCI 677 Distributed and Operating Systems Spring 2018 Lecture 1: January 22 Lecturer: Prashant Shenoy Scribe: Bin Wang 1.1 Introduction to the course The lecture started by outlining the administrative

More information

The LGI Pilot job portal. EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers

The LGI Pilot job portal. EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers The LGI Pilot job portal EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers Outline What? Why? How? Pro's and Cons What's next? Credits 2 What is LGI? LGI Project Server

More information

CILogon. Federating Non-Web Applications: An Update. Terry Fleury

CILogon. Federating Non-Web Applications: An Update. Terry Fleury Federating Non-Web Applications: An Update Terry Fleury tfleury@illinois.edu This material is based upon work supported by the National Science Foundation under grant number 0943633. Any opinions, findings,

More information

A Big Big Data Platform

A Big Big Data Platform A Big Big Data Platform John Urbanic, Parallel Computing Scientist 2017 Pittsburgh Supercomputing Center The Shift to Big Data New Emphases Pan-STARRS telescope http://pan-starrs.ifa.hawaii.edu/public/

More information

NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions

NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions Pradeep Sivakumar pradeep-sivakumar@northwestern.edu Contents What is XSEDE? Introduction Who uses XSEDE?

More information

Galaxy a community driven platform for accessible, transparent, and reproducible data science

Galaxy a community driven platform for accessible, transparent, and reproducible data science Galaxy a community driven platform for accessible, transparent, and reproducible data science https://speakerdeck.com/jxtx @jxtx / #usegalaxy A continuing crisis in genomics research: reproducibility What

More information

$ whoami. Carrie Ganote. id Group: NCGAS National Center for Genome Analysis Support

$ whoami. Carrie Ganote. id Group: NCGAS National Center for Genome Analysis Support $ whoami Carrie Ganote $ id Group: NCGAS National Center for Genome Analysis Support Galaxy Deployment on Heterogeneous Hardware Who we are and our compu1ng environment How we set up Galaxy on mul1ple

More information

Data Transfers in the Grid: Workload Analysis of Globus GridFTP

Data Transfers in the Grid: Workload Analysis of Globus GridFTP Data Transfers in the Grid: Workload Analysis of Globus GridFTP Nicolas Kourtellis, Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi University of South Florida Dan Fraser Argonne National Laboratory Objective

More information

The NIH Big Data to Knowledge Initiative: Raising the Prominence of Data

The NIH Big Data to Knowledge Initiative: Raising the Prominence of Data The NIH Big Data to Knowledge Initiative: Raising the Prominence of Data Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information Programs Development

More information

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner. Training for Database & Technology with Modeling in SAP HANA Courses Listed Beginner HA100 - SAP HANA Introduction Advanced HA300 - SAP HANA Certification Exam C_HANAIMP_13 - SAP Certified Application

More information

Lecture 1: January 23

Lecture 1: January 23 CMPSCI 677 Distributed and Operating Systems Spring 2019 Lecture 1: January 23 Lecturer: Prashant Shenoy Scribe: Jonathan Westin (2019), Bin Wang (2018) 1.1 Introduction to the course The lecture started

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why the Grid? Science is becoming increasingly digital and needs to deal with increasing amounts of

More information

Federated XDMoD Requirements

Federated XDMoD Requirements Federated XDMoD Requirements Date Version Person Change 2016-04-08 1.0 draft XMS Team Initial version Summary Definitions Assumptions Data Collection Local XDMoD Installation Module Support Data Federation

More information

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing

More information

INDIGO AAI An overview and status update!

INDIGO AAI An overview and status update! RIA-653549 INDIGO DataCloud INDIGO AAI An overview and status update! Andrea Ceccanti (INFN) on behalf of the INDIGO AAI Task Force! indigo-aai-tf@lists.indigo-datacloud.org INDIGO Datacloud An H2020 project

More information

Connected Mobility Digital Ecosystem: A Case Study on Intelligent Transport Analytics

Connected Mobility Digital Ecosystem: A Case Study on Intelligent Transport Analytics Connected Mobility Digital Ecosystem: A Case Study on Intelligent Transport Analytics Edward Dou a Peter Eklund a Tim Wray a Chris Cook a Vu The Tran a Abstract The UOWShuttle * is a transport app that

More information

Clouds: An Opportunity for Scientific Applications?

Clouds: An Opportunity for Scientific Applications? Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences Institute Acknowledgements Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former Ph.D. student, USC) Gideon Juve

More information

Research Cyberinfrastructure Upgrade Proposal - CITI

Research Cyberinfrastructure Upgrade Proposal - CITI 10/02/2015 Research Cyberinfrastructure Upgrade Proposal - CITI Bill Labate, Director Research Technology Group RCI Upgrade Executive Summary REQUEST Support for the funding request for upgrades to UCLA

More information

COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS

COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS Number: CLO-001 Passing Score: 800 Time Limit: 120 min File Version: 39.7 http://www.gratisexam.com/ COMPTIA CLO-001 EXAM QUESTIONS & ANSWERS Exam Name: CompTIA

More information

PetaLibrary Storage Service MOU

PetaLibrary Storage Service MOU University of Colorado Boulder Research Computing PetaLibrary Storage Service MOU 1. INTRODUCTION This is the memorandum of understanding (MOU) for the Research Computing (RC) PetaLibrary Storage Service.

More information

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel National Center for Supercomputing Applications University of Illinois

More information

Dataverse: Modular Storage and Migration to the Cloud

Dataverse: Modular Storage and Migration to the Cloud Dataverse: Modular Storage and Migration to the Cloud Gustavo Durand, Dataverse Technical Lead / Architect Leonid Andreev, Dataverse Senior Developer Dataverse Overview An open-source platform to publish,

More information

Visualization for Scientists. We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.

Visualization for Scientists. We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape. Visualization for Scientists We discuss how Deluge and Complexity call for new ideas in data exploration. Learn more, find tools at layerscape.org Transfer and synchronize files Easy fire-and-forget transfers

More information

Globus Research Data Management: Campus Deployment and Configuration. Steve Tuecke Vas Vasiliadis

Globus Research Data Management: Campus Deployment and Configuration. Steve Tuecke Vas Vasiliadis Globus Research Data Management: Campus Deployment and Configuration Steve Tuecke Vas Vasiliadis Presentations and other useful information available at globus.org/events/xsede15/tutorial 2 Agenda Globus

More information

Indiana University s Lustre WAN: The TeraGrid and Beyond

Indiana University s Lustre WAN: The TeraGrid and Beyond Indiana University s Lustre WAN: The TeraGrid and Beyond Stephen C. Simms Manager, Data Capacitor Project TeraGrid Site Lead, Indiana University ssimms@indiana.edu Lustre User Group Meeting April 17, 2009

More information

SUG Breakout Session: OSC OnDemand App Development

SUG Breakout Session: OSC OnDemand App Development SUG Breakout Session: OSC OnDemand App Development Basil Mohamed Gohar Web and Interface Applications Manager Eric Franz Senior Engineer & Technical Lead This work is supported by the National Science

More information

SLATE. Services Layer at the Edge. First Meeting of the National Research Platform Montana State University August 7-8, 2017

SLATE. Services Layer at the Edge. First Meeting of the National Research Platform Montana State University August 7-8, 2017 SLATE Services Layer at the Edge Rob Gardner University of Chicago Shawn McKee University of Michigan Joe Breen University of Utah First Meeting of the National Research Platform Montana State University

More information

Cornell Red Cloud: Campus-based Hybrid Cloud. Steven Lee Cornell University Center for Advanced Computing

Cornell Red Cloud: Campus-based Hybrid Cloud. Steven Lee Cornell University Center for Advanced Computing Cornell Red Cloud: Campus-based Hybrid Cloud Steven Lee Cornell University Center for Advanced Computing shl1@cornell.edu Cornell Center for Advanced Computing (CAC) Profile CAC mission, impact on research

More information

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

irods at TACC: Secure Infrastructure for Open Science Chris Jordan irods at TACC: Secure Infrastructure for Open Science Chris Jordan What is TACC? Texas Advanced Computing Center Cyberinfrastructure Resources for Open Science University of Texas System 9 Academic, 6

More information

Scheduling Computational and Storage Resources on the NRP

Scheduling Computational and Storage Resources on the NRP Scheduling Computational and Storage Resources on the NRP Rob Gardner Dima Mishin University of Chicago UCSD Second NRP Workshop Montana State University August 6-7, 2018 slides: http://bit.ly/nrp-scheduling

More information

COMPUTE CANADA GLOBUS PORTAL

COMPUTE CANADA GLOBUS PORTAL COMPUTE CANADA GLOBUS PORTAL Fast, user-friendly data transfer and sharing Jason Hlady University of Saskatchewan WestGrid / Compute Canada February 4, 2015 Why Globus? I need to easily, quickly, and reliably

More information

UNICORE Globus: Interoperability of Grid Infrastructures

UNICORE Globus: Interoperability of Grid Infrastructures UNICORE : Interoperability of Grid Infrastructures Michael Rambadt Philipp Wieder Central Institute for Applied Mathematics (ZAM) Research Centre Juelich D 52425 Juelich, Germany Phone: +49 2461 612057

More information

By Ian Foster. Zhifeng Yun

By Ian Foster. Zhifeng Yun By Ian Foster Zhifeng Yun Outline Introduction Globus Architecture Globus Software Details Dev.Globus Community Summary Future Readings Introduction Globus Toolkit v4 is the work of many Globus Alliance

More information

Galaxy. Daniel Blankenberg The Galaxy Team

Galaxy. Daniel Blankenberg The Galaxy Team Galaxy Daniel Blankenberg The Galaxy Team http://galaxyproject.org Overview What is Galaxy? What you can do in Galaxy analysis interface, tools and datasources data libraries workflows visualization sharing

More information

FeduShare Update. AuthNZ the SAML way for VOs

FeduShare Update. AuthNZ the SAML way for VOs FeduShare Update AuthNZ the SAML way for VOs FeduShare Goals: Provide transparent sharing of campus resources in support of (multiinstitutional) collaboration Support both HTTP and non-web access using

More information

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure Arnab K. Paul, Ryan Chard, Kyle Chard, Steven Tuecke, Ali R. Butt, Ian Foster Virginia Tech, Argonne National

More information

Software as a Service Gateways

Software as a Service Gateways Gateways with Apache Airavata Software as a Service Gateways Eroma Abeysinghe - https://sgrc.iu.edu 04/17/2018 Software as a Service Gateways Groups with actively developing and updating codes/tools. Code

More information

Galaxy Community Update

Galaxy Community Update Galaxy Community Update PAG XXVI January 17, 2018 San Diego, California, United States Dave Clements Johns Hopkins University Galaxy Team / Galaxy Community #usegalaxy @galaxyproject bit.ly/gxy-pag2018-upd

More information

StratusLab Cloud Distribution Installation. Charles Loomis (CNRS/LAL) 3 July 2014

StratusLab Cloud Distribution Installation. Charles Loomis (CNRS/LAL) 3 July 2014 StratusLab Cloud Distribution Installation Charles Loomis (CNRS/LAL) 3 July 2014 StratusLab What is it? Complete IaaS cloud distribution Open source (Apache 2 license) Works well for production private

More information

OGCE User Guide for OGCE Release 1

OGCE User Guide for OGCE Release 1 OGCE User Guide for OGCE Release 1 1 Publisher s Note Release 2 begins the migration to open standards portlets. The following has been published by the Open Grids Computing Environment: OGCE Release 2

More information

ACTIVE MICROSOFT CERTIFICATIONS:

ACTIVE MICROSOFT CERTIFICATIONS: Last Activity Recorded : July 20, 2017 Microsoft Certification ID : 2665612 MARC GROTE Wittorfer Strasse 4 Bardowick, Lower Saxony 21357 DE marc.grote@it-consulting-grote.de ACTIVE MICROSOFT CERTIFICATIONS:

More information

SAML-Based SSO Solution

SAML-Based SSO Solution About SAML SSO Solution, page 1 Single Sign on Single Service Provider Agreement, page 2 SAML-Based SSO Features, page 2 Basic Elements of a SAML SSO Solution, page 3 Cisco Unified Communications Applications

More information

Web-Based Visualization and Visual Analysis for High-Throughput Genomics. Jeremy Goecks! Computational Biology Institute

Web-Based Visualization and Visual Analysis for High-Throughput Genomics. Jeremy Goecks! Computational Biology Institute Web-Based Visualization and Visual Analysis for High-Throughput Genomics with Galaxy! Jeremy Goecks! Computational Biology Institute Topics Galaxy Visualization framework Large-scale visualization Integrated

More information

Proven video conference management software for Cisco Meeting Server

Proven video conference management software for Cisco Meeting Server Proven video conference management software for Cisco Meeting Server VQ Conference Manager (formerly Acano Manager) is your key to dependable, scalable, self-service video conferencing Increase service

More information

A More Realistic Way of Stressing the End-to-end I/O System

A More Realistic Way of Stressing the End-to-end I/O System A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed

More information

Building the Modern Research Data Portal. Developer Tutorial

Building the Modern Research Data Portal. Developer Tutorial Building the Modern Research Data Portal Developer Tutorial Thank you to our sponsors! U. S. DEPARTMENT OF ENERGY 2 Presentation material available at www.globusworld.org/workshop2016 bit.ly/globus-2016

More information

Developing Applications with Networking Capabilities via End-to-End Software Defined Networking (DANCES)

Developing Applications with Networking Capabilities via End-to-End Software Defined Networking (DANCES) Developing Applications with Networking Capabilities via End-to-End Software Defined Networking (DANCES) Kathy Benninger Pittsburgh Supercomputing Center OIN Workshop Pittsburgh, PA 18 March 2015 What

More information

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing

One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool. D. Mason for CMS Software & Computing One Pool To Rule Them All The CMS HTCondor/glideinWMS Global Pool D. Mason for CMS Software & Computing 1 Going to try to give you a picture of the CMS HTCondor/ glideinwms global pool What s the use case

More information

Galaxy workshop at the Winter School Igor Makunin

Galaxy workshop at the Winter School Igor Makunin Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis

More information

Allowing Users to Run Services at the OLCF with Kubernetes

Allowing Users to Run Services at the OLCF with Kubernetes Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing

More information

SAML-Based SSO Solution

SAML-Based SSO Solution About SAML SSO Solution, page 1 SAML-Based SSO Features, page 2 Basic Elements of a SAML SSO Solution, page 2 SAML SSO Web Browsers, page 3 Cisco Unified Communications Applications that Support SAML SSO,

More information

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti S3IT: Service and Support for Science IT Cloud middleware Part2: Let s pick one cloud IaaS middleware: OpenStack Sergio Maffioletti S3IT: Service and Support for Science IT, University of Zurich http://www.s3it.uzh.ch/

More information

2014 Bond Technology Update Progress of the Technology Network Infrastructure Upgrades Long Range Planning Committee March 4, 2015

2014 Bond Technology Update Progress of the Technology Network Infrastructure Upgrades Long Range Planning Committee March 4, 2015 2014 Bond Technology Update Progress of the Technology Network Infrastructure Upgrades Long Range Planning Committee March 4, 2015 2014 Bond Technology Update Progress of the Technology Network Infrastructure

More information

SCREAM15 Jetstream Notes

SCREAM15 Jetstream Notes SCREAM15 Jetstream Notes Slide 1 The US National Science Foundation (NSF) in 2015 awarded funding for a first- of- a- kind distributed cyberinfrastructure (DCI) system called Jetstream. Jetstream will

More information

Welcome! Presenters: STFC January 10, 2019

Welcome! Presenters: STFC January 10, 2019 Welcome! Presenters: Vas Vasiliadis vas@uchicago.edu Brendan McCollam bjmc@globus.org STFC January 10, 2019 Agenda Morning topics Introduction to the Globus SaaS Service overview & architecture Demo: A

More information

COURSE LISTING. Courses Listed. Training for Cloud with SAP Cloud Platform in Development. 23 November 2017 (08:12 GMT) Beginner.

COURSE LISTING. Courses Listed. Training for Cloud with SAP Cloud Platform in Development. 23 November 2017 (08:12 GMT) Beginner. Training for Cloud with SAP Cloud Platform in Development Courses Listed Beginner CLD100 - Cloud for SAP Intermediate CP100 - SAP Cloud Platform Certification Exam C_CP_11 - SAP Certified Development Associate

More information

COURSE LISTING. Courses Listed. Training for Database & Technology with Development in SAP Cloud Platform. 1 December 2017 (22:41 GMT) Beginner

COURSE LISTING. Courses Listed. Training for Database & Technology with Development in SAP Cloud Platform. 1 December 2017 (22:41 GMT) Beginner Training for Database & Technology with Development in SAP Cloud Platform Courses Listed Beginner CLD100 - Cloud for SAP Intermediate CP100 - SAP Cloud Platform Certification Exam C_CP_11 - SAP Certified

More information

Managing Grid Credentials

Managing Grid Credentials Managing Grid Credentials Jim Basney http://www.ncsa.uiuc.edu/~jbasney/ Senior Research Scientist Grid and Security Technologies National Center for Supercomputing Applications

More information

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users Phil Andrews, Bryan Banister, Patricia Kovatch, Chris Jordan San Diego Supercomputer Center University

More information

Remote & Collaborative Visualization. Texas Advanced Computing Center

Remote & Collaborative Visualization. Texas Advanced Computing Center Remote & Collaborative Visualization Texas Advanced Computing Center TACC Remote Visualization Systems Longhorn NSF XD Dell Visualization Cluster 256 nodes, each 8 cores, 48 GB (or 144 GB) memory, 2 NVIDIA

More information

The GISandbox: A Science Gateway For Geospatial Computing. Davide Del Vento, Eric Shook, Andrea Zonca

The GISandbox: A Science Gateway For Geospatial Computing. Davide Del Vento, Eric Shook, Andrea Zonca The GISandbox: A Science Gateway For Geospatial Computing Davide Del Vento, Eric Shook, Andrea Zonca 1 Paleoscape Model and Human Origins Simulate Climate and Vegetation during the Last Glacial Maximum

More information

Integrating Apache Mesos with Science Gateways via Apache Airavata

Integrating Apache Mesos with Science Gateways via Apache Airavata Integrating Apache Mesos with Science Gateways via Apache Airavata Organization: Apache Software Foundation Abstract: Science Gateways federate resources from multiple organizations. Most gateways solve

More information

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud with SAP Hybris Marketing Cloud Courses Listed HY760 - SAP Hybris Marketing Cloud C_HYMC_1702 - SAP Certified Technology Associate - SAP Hybris Marketing Cloud (1702) Implementation Page 1 of 12 All available

More information

XSEDE Campus Bridging Tools Rich Knepper Jim Ferguson

XSEDE Campus Bridging Tools Rich Knepper Jim Ferguson April 3, 2014 XSEDE Campus Bridging Tools Rich Knepper (rich@iu.edu) Jim Ferguson (jwf@utk.edu) What is Campus Bridging? Bridging the gap between local researcher cyberinfrastructure, campus CI, and national

More information

XSEDE Iden ty Management Use Cases

XSEDE Iden ty Management Use Cases XSEDE Iden ty Management Use Cases January 6, 2017 Version 1.3 These use cases describe how researchers, scien sts, and other community members register themselves with the XSEDE system, manage their profile

More information

Grid Middleware and Globus Toolkit Architecture

Grid Middleware and Globus Toolkit Architecture Grid Middleware and Globus Toolkit Architecture Lisa Childers Argonne National Laboratory University of Chicago 2 Overview Grid Middleware The problem: supporting Virtual Organizations equirements Capabilities

More information

Accessible, Transparent and Reproducible Analysis with Galaxy

Accessible, Transparent and Reproducible Analysis with Galaxy Accessible, Transparent and Reproducible Analysis with Galaxy Application of Next Generation Sequencing Technologies for Whole Transcriptome and Genome Analysis ABRF 2013 Saturday, March 2, 2013 Palm Springs,

More information

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science ECSS Symposium, 12/16/14 M. L. Norman, R. L. Moore, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S.

More information

Grid Scheduling Architectures with Globus

Grid Scheduling Architectures with Globus Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents

More information

Docker and HPE Accelerate Digital Transformation to Enable Hybrid IT. Steven Follis Solutions Engineer Docker Inc.

Docker and HPE Accelerate Digital Transformation to Enable Hybrid IT. Steven Follis Solutions Engineer Docker Inc. Docker and HPE Accelerate Digital Transformation to Enable Hybrid IT Steven Follis Solutions Engineer Docker Inc. Containers are the Fastest Growing Cloud Enabling Technology Title source: 451 Research

More information