Data Grids, Digital Libraries, and Persistent Archives
|
|
- Tracey Hopkins
- 6 years ago
- Views:
Transcription
1 Data Grids, Digital Libraries, and Persistent Archives Reagan W. Moore 1
2 Archive Definition Computer science - archive is the hardware and software infrastructure used to manage data Preservation community - archives is the material that is being preserved 2
3 Persistent Archive Software system that manages evolution of the hardware and software infrastructure A persistent archive preserves the authenticity and integrity of digital entities while the underlying technology evolves Combination of the material that is being preserved and the infrastructure used to preserve the material 3
4 Data Grid Grid Community definition The infrastructure used to manage distributed data as a collection Digital library and preservation community definition The distributed data that is being organized and managed as a collection A data grid is a mechanism to support sharing of data and the collection that is being shared 4
5 Data Sharing Management of access controls on local resources to share data Put controls on resources Creation of a collection that is being shared across distributed resources Put controls on collection The SRB data grid does both, enacts controls on both resources and on collections (data and metadata) 5
6 Topics Data Grids - managing distributed data Distributed data management for a project Digital Libraries - publication of data Management of collection hierarchies Persistent Archives - preservation of data Management of technology evolution Storage Resource Broker example Currently supporting all three (seven) data management environments 6
7 Data Management Systems (Supported by Storage Resource Broker) Data collecting Sensor systems, object ring buffers and portals Data organization Collections, manage data context Data sharing Data grids, manage heterogeneity of resources Data publication Digital libraries, support discovery Data preservation Persistent archives, manage technology evolution Data analysis Processing pipelines, manage knowledge extraction 7
8 Data Management Systems Data grid for managing distributed data Latency management for bulk analyses of collections Infrastructure independent name spaces for describing data, resources, users, and state information Digital library for managing data context Curation services for managing collections Descriptive metadata for discovery Persistent archive to manage technology evolution Interoperability mechanisms between heterogeneous storage systems and user access mechanisms 8
9 Provide Context for Data Properties of files Provenance - source Descriptive attributes Structure Organize properties as metadata in a collection hierarchy Define operations on file properties Manage state information - location, replicas, containers Separate context management from content management Maintain consistency of context as operations are done on content 9
10 Data Grids Software systems that manage distributed data Control global name spaces for Resources Users Files Metadata context Provide standard operations on each name space Provide single sign-on authentication, collection management, latency management, replication, and federation Generic distributed data management technology 10
11 Managing Distributed Data Data Access Methods (Web Browser, DSpace, OAI-PMH) Storage Repository Storage location User name File name File context (creation date, ) Access constraints Naming conventions provided by storage systems 11
12 Data Grids Provide a Level of Indirection for Each Naming Convention Data Access Methods (C library, Unix, Web Browser) Data Collection Storage Repository Storage location User name File name File context (creation date, ) Access constraints Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data is organized as a collection 12
13 Logical Name Spaces Storage resources Logical names for managing collections of resources User names (user-name / domain / data grid) Distinguished names for users to manage access controls Digital Entities (files, blobs, structured data, ) Logical name space for global identifiers for files Context - Metadata attributes Standard metadata attributes, Dublin Core State information resulting from data grid operations User-defined metadata 13
14 Logical Resource Name Represents a list of physical resources Operations on the logical resource name result in operations on the list of physical resources Load leveling -write to the next physical resource in the list Fault tolerance - write to k of n physical resources Replication - write to each physical resource Compound resource - write to the disk cache in front of the tape archive Federated resource - write to the controlled resource in another data grid 14
15 Storage Repository Virtualization How does one access data stored on multiple systems? User Application Archive Database File System 15
16 Storage Repository Virtualization (Standard Operations on Logical Resource Names) Remote operations Unix file system Latency management Procedures Transformations Third party transfer Filtering Queries Collective operations Load leveling Fault tolerance Replication User Application Common set of operations for interacting with every type of storage repository Archive Database File System 16
17 Logical File Name Abstraction How does one identify files stored on multiple systems? User Application Archive at SDSC Database At U Md File System at NARA 17
18 Context Abstraction Logical name space Location independent identifier Persistent identifier User Application Collection owned data Access controls Audit trails Checksums Descriptive metadata Common naming convention and set of attributes for describing digital entities Inter-realm authentication Single sign-on system Archive at SDSC Database At U Md File System at U Texas 18
19 Federated Server Architecture Logical Name Or Attribute Condition Read Application Peer-to-peer Brokering Parallel Data Access SRB server /6 4 SRB server SRB agent 2 5 SRB agent 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control R1 MCAT Data Access R2 Server(s) Spawning 19
20 SRB Latency Management Remote Proxies, Staging Data Aggregation Containers Prefetch Source Replication Server-initiated I/O Network Network Streaming Parallel I/O Destination Destination Caching Client-initiated I/O 20
21 Latency Management -Bulk Operations Bulk register Create a logical name for a file Bulk load Create a copy of the file on a data grid storage repository Bulk unload Provide containers to hold small files and pointers to each file location Bulk delete Mark as deleted in metadata catalog After specified interval, delete file Bulk metadata load Requests for bulk operations for access control setting, 21
22 Data Grid Federation Link multiple independent data grids Coordinate metadata between independent metadata catalogs Provide consistency and access constraints for each of the four logical name spaces (resources, users, files, metadata) Peer-to-peer federations, data access Replication federations, shared resources Hierarchical federations, consistency constraints Tune data grid federation by implementing different consistency and access constraints 22
23 Federation Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Collection A Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection B Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Access controls and consistency constraints on cross registration of digital entities 23
24 Replication Constraints Peer-to-Peer Data Grids Free Floating Partial User-ID Sharing Occasional Interchange Partial Resource Sharing Replicated Data Consistency Constraints System Set Access Controls System Controlled Complete Synch Complete User-ID Sharing User and Data Replica System Managed Replication Connection From Any Zone Complete Resource Sharing Replicated Catalog Replication Data Grids No Metadata Synch 24 Resource Interaction Access Constraints Federation Environments Hierarchical Zone Organization One Shared User-ID Nomadic System Managed Replication System Set Access Controls System Controlled Partial Synch No Resource Sharing Snow Flake Super Administrator Zone Control Master Slave System Controlled Complete Synch No User-ID Sharing Deep Archive Hierarchical Data Grids
25 Generic Infrastructure SDSC developed the Storage Resource Broker (SRB) to support access to distributed data Effort started in 1996 as a DARPA funded project Now support over 30 national/international projects Development team of 12 staff is led by Michael Wan, data management systems Arcot Rajasekar, information management systems 25
26 Data Grid Capabilities Data manipulation Containers Parallel I/O Firewall interactions Resource interactions Fault tolerance Load leveling Replication HIPAA security requirements Authentication of all users Access controls on data and metadata Audit trails Data encryption Centralized control Application interfaces C library, Shell commands, Java, Perl, Python, WSDL, workflow 26
27 Digital Library Collection hierarchy for organizing data User-defined metadata Collection level metadata Metadata manipulation Schema extension Bulk metadata processing Queries on metadata Access controls on metadata Views on collections Digital library APIs DSpace, Fedora, OAI-PMH, web browsers METS metadata XML schema 27
28 28
29 Persistent Archives Authenticity metadata Provenance User logical name space Integrity metadata Audit trails, checksums Access controls Consistency Context update on all content operations Persistency Infrastructure independence Storage repository abstraction Information repository abstraction Access abstraction (standard operations) 29
30 National Archives Persistent Archive NARA U Md SDSC MCAT MCAT MCAT Principle copy stored at NARA with complete metadata catalog Replicated copy at U Md for improved access, load balancing and disaster recovery Deep Archive at SDSC, no user access, but complete copy 30
31 Data Grid Federation - zonesrb Application C, C++, Java Libraries Linux I/O Unix Shell Java, NT Browser Kepler Actors DLL / Python, Perl Federation Management HTTP DSpace OpenDAP OAI, WSDL, WSRF Consistency & Metadata Management / Authorization,Authentication,Audit Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Databases DB2, Oracle, Sybase, Postgres, mysql, Informix Archives - Tape, Sam-QFS, DMF, ORB HPSS, ADSM, UniTree, ADS Storage Repository Virtualization File Systems Unix, NT, Mac OSX Databases DB2, Oracle, Sybase, SQLserver,Postgres, mysql, Informix 31
32 Examples of Extensibility Storage Repository Driver evolution Initially supported Unix file system Added archival access - UniTree, HPSS Added FTP/HTTP Added database blob access Added database table interface Added Windows file system Added project archives - Dcache, Castor, ADS Added Object Ring Buffer, Datascope Adding GridFTP version 3.3 Database management evolution Postgres DB2 Oracle Informix Sybase mysql (most difficult port - no locks, no views, limited SQL) 32
33 Examples of Extensibility The 3 fundamental APIs are C library, shell commands, Java Other access mechanisms are ported on top of these interfaces API evolution Initial access through C library, Unix shell command Added inq Windows browser (C++ library) Added mysrb Web browser (C library and shell commands) Added Java (Jargon) Added Perl/Python load libraries (shell command) Added WSDL (Java) Added OAI-PMH, OpenDAP, DSpace digital library (Java) Added Kepler actors for dataflow access (Java) Adding GridFTP version 3.3 (C library) 33
34 Sites Using the SRB Academia Sinica, Taiwan ASCC, Computing Centre, Taiwan Australian National University Bedford Oceanography,Canada Bioinformatics Institute, Singapore CSIRO, Australia Data Storage Institute, Singapore EGEE, French National Center GeoForschungsZentrum, Germany James Cook University, Australia KEK High Energy Physics, Japan Max Planck Institute, Netherlands Parallab, Norway South Australian Advanced Computing UIB (Parallab), Norway University of Amsterdam University of Cambridge, Astronomy University of Cambridge, e-science University of Edinburgh University of Genoa, Italy University of Hong Kong Univrsity of Manchester University of Oslo University of Southampton York Univ (UK) CiteSeer, Penn State City Univ. of New York Geospatial Environment, UCSD Drexel University EOSDIS Distributed Active, NASA Goddard Georgia Tech Kentucky State Libraries & Archives Library of Congress Los Alamos National Lab NASA Ames NASA Goddard Space Flight Center NCSA Grid Computing NIH (NCI Center for Bioinformatics) Penn State University Pittsburgh Supercomputing Center Purdue University. Indiana Stanford University TACC, University of Texas Texas A & M UC Santa Cruz UCLA UCSD Neuroscience University of Maryland University of Michigan, CAC department University of New Mexico University of Washington University of Wisconsin USC Yale University 34
35 Storage Resource Broker Collections at SDSC (11/2/2004) GBs of data stored Number of files Number of Users Data Grid Ź Ź Ź NSF/ITR - National Virtual Observatory 53,858 9,536, NSF - 24,738 5,754, Hayden Planetarium - Evolution of the Solar System visualizations 7, , NSF/NPACI - Joint Center for Structural Genomics 5, , NSF/NPACI - Biology and Environmental collections 8,851 33, NSF - TeraGrid, ENZO Cosmology simulations 121,550 1,096,947 3,247 NIH - Biomedical Informatics Research Network 6,002 4,107, Digital Library Ź Ź Ź NLM - Digital Embryo image collection , NSF/NPACI - Long Term Ecological Reserve 253 8, NSF/NPACI - Grid Portal 2,211 51, NIH - Alliance for Cell Signaling microarray data , NSF - National Science Digital Library SIO Explorer collection 2, , NSF/NPACI -Transana education research video collection 92 2, NSF/ITR - Southern California Earthquake Center 91,040 1,791, Persistent Archive Ź Ź Ź UCSD Libraries archive , NARA- Research Prototype Persistent Archive , NSF - National Science Digital Library persistent archive 3,571 26,908, TOTAL 328 TB 51 million 4,900 35
36 Grid Interfaces GSI, support versions 1, 2, 3, Java GridFTP version 3.3 interface to SRB collection Use GSI certificate to identify the user to the SRB Reference file by a SRB logical name space Use SRB access controls for allowed operations Initially support serial transport SRB supports 4 different firewall interaction protocols (client-driven parallel I/O, server-driven parallel I/O, bulk file registration, federated data grid access) GridFTP version 3.3 driver for SRB collection Store data at a remote site under the SRB ID Data will be shareable through SRB access controls\ Store data at a remote site under user GSI certificate Data will not be shareable through SRB access controls 36
37 Grid Interfaces Replica Location Service Interface Simon Metson GMCat mimics the LRC interface, enabling the files registered in an MCat to appear on the giggle framework (RLS). Available from (also linked from the third party software on the SRB page) Storage Resource Manager SRM Version 1, SRB driver created to store data in SRM SRM Version 2, development effort to put SRM interface on top of SRB (Alasdair Earl) SRM Version 3, development effort to put SRM interface on top of SRB (Peter Kunszt) 37
38 Conclusion Distributed data management systems can be built on generic data grid infrastructure Data grids to support bulk access across remote sites Integration of data grid and digital library capabilities to manage massive data collections Federation of data grids to build international discipline-wide collections 38
39 QuickTime and a FF (Uncompressed) decompresso are needed to see this picture. QuickTime and a F (Uncompressed) decompressor are needed to see this picture. QuickTime and a QuickTime and a F (Uncompressed) decompressorf (Uncompressed) decompressor are needed to see this picture. are needed to see this picture. QuickTime and a F (Uncompressed) decompressor are needed to see this picture. QuickTime and a F (Uncompressed) decompressor are needed to see this picture. SDSC SRB Team (left to right) Arun Jagatheesan George Kremenek Sheau-Yen Chen Arcot Rajasekar (SRB development lead) Reagan Moore (SRB PI) Michael Wan (SRB architect) Roman Olschanowsky (BIRN) Bing Zhu Charlie Cowart Lucas Gilbert Tim Warnock Wayne Schroeder (SRB product) Adam Birnbaum (SRB production) Antoine De Torcy Vicky Rowley (BIRN) Marcio Faerman (SCEC) Students & emeritus Erik Vandekieft Reena Mathew Xi (Cynthia) Sheng Allen Ding Grace Lin Qiao Xin Daniel Moore Ethan Chen Jon Weinburg Supported by about 20 projects (NSF, DOE, NASA, NARA, NIH, LOC, NHPRC) 39
40 For More Information Reagan W. Moore
Mitigating Risk of Data Loss in Preservation Environments
Storage Resource Broker Mitigating Risk of Data Loss in Preservation Environments Reagan W. Moore San Diego Supercomputer Center Joseph JaJa University of Maryland Robert Chadduck National Archives and
More informationDigital Curation and Preservation: Defining the Research Agenda for the Next Decade
Storage Resource Broker Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb Background NARA research prototype persistent
More informationKnowledge-based Grids
Knowledge-based Grids Reagan Moore San Diego Supercomputer Center (http://www.npaci.edu/dice/) Data Intensive Computing Environment Chaitan Baru Walter Crescenzi Amarnath Gupta Bertram Ludaescher Richard
More informationA Simple Mass Storage System for the SRB Data Grid
A Simple Mass Storage System for the SRB Data Grid Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews San Diego Supercomputer Center SDSC/UCSD/NPACI Outline Motivations for implementing a Mass Storage
More informationData Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia
Data Grid Services: The Storage Resource Broker Andrew A. Chien CSE 225, Spring 2004 May 26, 2004 Administrivia This week:» 5/28 meet ½ hour early (430pm) Project Reports Due, 6/10, to Andrew s Office
More informationIRODS: the Integrated Rule- Oriented Data-Management System
IRODS: the Integrated Rule- Oriented Data-Management System Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute
More informationTranscontinental Persistent Archive Prototype
Transcontinental Persistent Archive Prototype Policy-Driven Data Preservation Reagan W. Moore University of North Carolina at Chapel Hill rwmoore@renci.org http://irods.diceresearch.org p// NSF OCI-0848296
More informationPolicy Based Distributed Data Management Systems
Policy Based Distributed Data Management Systems Reagan W. Moore Arcot Rajasekar Mike Wan {moore,sekar,mwan}@diceresearch.org http://irods.diceresearch.org Abstract Digital repositories can be defined
More informationIntroduction to The Storage Resource Broker
http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/ Policy for re-use This presentation can be re-used for academic
More informationSRB Logical Structure
SDSC Storage Resource Broker () Introduction and Applications based on material by Arcot Rajasekar, Reagan Moore et al San Diego Supercomputer Center, UC San Diego A distributed file system (Data Grid),
More informationDATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS
DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS Reagan W. Moore San Diego Supercomputer Center San Diego, CA, USA Abstract Scientific applications now have data management requirements that extend
More informationManaging Large Scale Data for Earthquake Simulations
Managing Large Scale Data for Earthquake Simulations Marcio Faerman 1, Reagan Moore 2, Bernard Minister 3, and Philip Maechling 4 1 San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA, USA mfaerman@gmail.com
More informationManaging Large Distributed Data Sets using the Storage Resource Broker
Managing Large Distributed Data Sets using the Storage Resource Broker Reagan W. Moore San Diego Supercomputer Center 9500 Gilman Drive, MC-0505 La Jolla, CA 92093-0505 moore@sdsc.edu Telephone: 858 534
More informationDSpace Fedora. Eprints Greenstone. Handle System
Enabling Inter-repository repository Access Management between irods and Fedora Bing Zhu, Uni. of California: San Diego Richard Marciano Reagan Moore University of North Carolina at Chapel Hill May 18,
More informationDistributed Data Management with Storage Resource Broker in the UK
Distributed Data Management with Storage Resource Broker in the UK Michael Doherty, Lisa Blanshard, Ananta Manandhar, Rik Tyer, Kerstin Kleese @ CCLRC, UK Abstract The Storage Resource Broker (SRB) is
More informationThe International Journal of Digital Curation Issue 1, Volume
Towards a Theory of Digital Preservation 63 Towards a Theory of Digital Preservation Reagan Moore, San Diego Supercomputer Center June 2008 Abstract A preservation environment manages communication from
More informationImplementing Trusted Digital Repositories
Implementing Trusted Digital Repositories Reagan W. Moore, Arcot Rajasekar, Richard Marciano San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093-0505 {moore, sekar, marciano}@sdsc.edu
More informationData Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper
Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments White Paper 2 SRB: Enabling Collaboration in Complex Distributed Environments Table of Contents Introduction...3
More informationCheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment
Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Paul Watry Univ. of Liverpool, NaCTeM pwatry@liverpool.ac.uk Ray Larson Univ. of California, Berkeley
More informationThe NCAR Community Data Portal
The NCAR Community Data Portal http://cdp.ucar.edu/ QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this
More informationT-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH
T-Systems Solutions for Research. Data Management and Security Andreas Landhäußer Andreas.Landhaeusser@t-systems-sfr.com T-Systems Solutions for Research GmbH 12.09.2008 1 Content Nirvana SRB (SRB2008)
More informationirods User Group integrated Rule Oriented Data System Reagan Moore
irods User Group integrated Rule Oriented Data System Reagan Moore {moore, sekar, mwan, schroeder, bzhu, ptooby, antoine, sheauc}@diceresearch.org {chienyi, marciano, michael_conway}@email.unc.edu 1 Wireless
More informationMetaData Management Control of Distributed Digital Objects using irods. Venkata Raviteja Vutukuri
Abstract: MetaData Management Control of Distributed Digital Objects using irods Venkata Raviteja Vutukuri irods is a middleware mechanism which accomplishes high level control on diverse distributed digital
More informationWeb Services Based Instrument Monitoring and Control
Web Services Based Instrument Monitoring and Control Peter Turner, 1 Ian M. Atkinson, 2 Douglas du Boulay, 1 Cameron Huddlestone-Holmes, 2 Tristan King, 2 Romain Quilici, 1 Mathew Wyatt, 2 Donald F. McMullen,
More informationBuilding a Reference Implementation for Long-Term Preservation
Building a Reference Implementation for Long-Term Preservation Richard Marciano Lead Scientist Sustainable Archives & Library Technologies (SALT) lab director Data Intensive Cyber Environment (DICE) group
More informationCollection-Based Persistent Digital Archives - Part 1
Página 1 de 16 D-Lib Magazine March 2000 Volume 6 Number 3 ISSN 1082-9873 Collection-Based Persistent Digital Archives - Part 1 Reagan Moore, Chaitan Baru, Arcot Rajasekar, Bertram Ludaescher, Richard
More informationScientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego
Scientific Workflow Tools Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego 1 escience Today Increasing number of Cyberinfrastructure (CI) technologies Data Repositories: Network
More informationScalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *
Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments * Joesph JaJa joseph@ Mike Smorul toaster@ Fritz McCall fmccall@ Yang Wang wpwy@ Institute
More informationArun Jagatheesan Reagan Moore San Diego Supercomputer Center (SDSC) University of California, San Diego {arun,
Arun Jagatheesan Reagan Moore (SDSC) University of California, San Diego {arun, moore} @sdsc.edu University of Florida 1 Storage Resource Broker Distributed data management technology Developed at (Univ.
More informationPolicy-Driven Repository Interoperability: Enabling Integration Patterns for irods and Fedora
Policy-Driven Repository Interoperability: Enabling Integration Patterns for irods and Fedora David Pcolar Carolina Digital Repository (CDR) UNC Chapel Hill david_pcolar@unc.edu Alexandra Chassanoff School
More informationEnabling Interaction and Quality in a Distributed Data DRIS
Purdue University Purdue e-pubs Libraries Research Publications 5-11-2006 Enabling Interaction and Quality in a Distributed Data DRIS D. Scott Brandt Purdue University, techman@purdue.edu James L. Mullins
More informationConstraint-based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives: Final Report May 2007
SDSC Technical Report 2007-2 Constraint-based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives: Final Report May 2007 Reagan W. Moore (SDSC) Arcot Rajasekar (SDSC) Michael Wan (SDSC)
More informationLeveraging High Performance Computing Infrastructure for Trusted Digital Preservation
Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation 12 December 2007 Digital Curation Conference Washington D.C. Richard Moore Director of Production Systems San Diego
More informationDataONE: Open Persistent Access to Earth Observational Data
Open Persistent Access to al Robert J. Sandusky, UIC University of Illinois at Chicago The Net Partners Update: ONE and the Conservancy December 14, 2009 Outline NSF s Net Program ONE Introduction Motivating
More informationA Metadata Catalog Service for Data Intensive Applications
Metadata Catalog Service Draft August 5, 2002 A Metadata Catalog Service for Data Intensive Applications Ann Chervenak, Ewa Deelman, Carl Kesselman, Laura Pearlman, Gurmeet Singh Version 1.0 1 Introduction
More informationWhite Paper: National Data Infrastructure for Earth System Science
White Paper: National Data Infrastructure for Earth System Science Reagan W. Moore Arcot Rajasekar Mike Conway University of North Carolina at Chapel Hill Wayne Schroeder Mike Wan University of California,
More informationDatabase Assessment for PDMS
Database Assessment for PDMS Abhishek Gaurav, Nayden Markatchev, Philip Rizk and Rob Simmonds Grid Research Centre, University of Calgary. http://grid.ucalgary.ca 1 Introduction This document describes
More informationDatagridflows: Managing Long-Run Processes on Datagrids
Datagridflows: Managing Long-Run Processes on Datagrids Arun Jagatheesan 1,2, Jonathan Weinberg 1, Reena Mathew 1, Allen Ding 1, Erik Vandekieft 1, Daniel Moore 1,3, Reagan Moore 1, Lucas Gilbert 1, Mark
More informationStorage Challenges at the San Diego Supercomputer Center
Storage Challenges at the San Diego Supercomputer Center Richard Marciano P.O. Box 85608 San Diego, CA 92186 Ph: (619) 534-8345 Fax: (619 822-0906 E-mail: marciano@sdsc.edu Presented at the THIC meeting
More informationPromoting Open Standards for Digital Repository. case study examples and challenges
Promoting Open Standards for Digital Repository Infrastructures: case study examples and challenges Flavia Donno CERN P. Fuhrmann, DESY, E. Ronchieri, INFN-CNAF OGF-Europe Community Outreach Seminar Digital
More informationSDS: A Scalable Data Services System in Data Grid
SDS: A Scalable Data s System in Data Grid Xiaoning Peng School of Information Science & Engineering, Central South University Changsha 410083, China Department of Computer Science and Technology, Huaihua
More informationSAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC
SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC Bryan Banister, San Diego Supercomputing Center bryan@sdsc.edu Manager, Storage Systems and Production Servers Production Services Department Big
More informationTHE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid
THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The
More informationData Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices
Data Management 1 Grid data management Different sources of data Sensors Analytic equipment Measurement tools and devices Need to discover patterns in data to create information Need mechanisms to deal
More informationIntegration of Cloud Storage with Data Grids
Integration of Cloud Storage with Data Grids M. WAN University of California, San Diego, CA, USA AND R. MOORE, AND A. RAJASEKAR, University of North Carolina, Chapel Hill, NC, USA The integrated Rule Oriented
More informationKnowledge Discovery Services and Tools on Grids
Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid
More informationAstrophysics and the Grid: Experience with EGEE
Astrophysics and the Grid: Experience with EGEE Fabio Pasian INAF & VObs.it IVOA 2007 Interoperability Meeting Astro-RG session INAF experience with the grid (from the IVOA 2006 Interop): In INAF there
More informationGrid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007
Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software
More informationComparing Open Source Digital Library Software
Comparing Open Source Digital Library Software George Pyrounakis University of Athens, Greece Mara Nikolaidou Harokopio University of Athens, Greece Topic: Digital Libraries: Design and Development, Open
More informationDigital repositories as research infrastructure: a UK perspective
Digital repositories as research infrastructure: a UK perspective Dr Liz Lyon Director This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 UKOLN is supported by: Presentation
More informationOverview. ❶ Short introduction to the company. ❶ Short history of database and DBMS. ❶ What is the next DBMS s generation? ❶ Introduction to Tamino
❶ The XML Company Overview ❶ Short introduction to the company ❶ Short history of database and DBMS ❶ What is the next DBMS s generation? ❶ Introduction to Tamino Enterprise Transaction Suite High-Performance
More informationChapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies
More informationirods usage at CC-IN2P3: a long history
Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules irods usage at CC-IN2P3: a long history Jean-Yves Nief Yonny Cardenas Pascal Calvat What is CC-IN2P3? IN2P3:
More informationirods Status Integrated Rule-Oriented Data System Reagan Moore Mike Wan Jean-Yves Nief
irods Status Integrated Rule-Oriented Data System Reagan Moore Mike Wan {moore,mwan@diceresearch.org} Jean-Yves Nief nief@cc.in2p3.fr 1 Seeking Feedback What is important to the irods user community? Which
More informationDistributing BaBar Data using the Storage Resource Broker (SRB)
Distributing BaBar Data using the Storage Resource Broker (SRB) W. Kröger (SLAC), L. Martin (Univ. Paris VI et VII), D. Boutigny (LAPP - CNRS/IN2P3), A. Hanushevsky (SLAC), A. Hasan (SLAC) For the BaBar
More informationBy Ian Foster. Zhifeng Yun
By Ian Foster Zhifeng Yun Outline Introduction Globus Architecture Globus Software Details Dev.Globus Community Summary Future Readings Introduction Globus Toolkit v4 is the work of many Globus Alliance
More informationIntroduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008
Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, 13-14 November 2008 Outline Introduction SRM Storage Elements in glite LCG File Catalog (LFC) Information System Grid Tutorial, 13-14
More informationScaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users
Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users Phil Andrews, Bryan Banister, Patricia Kovatch, Chris Jordan San Diego Supercomputer Center University
More informationSessions 3/4: Member Node Breakouts. John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group
Sessions 3/4: Member Node Breakouts John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group Schedule 1:00-2:20 and 2:40-4:00 Member Node Breakouts Member Node Overview and Process Overview Documentation
More informationIng. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research
Ing. José A. Mejía Villar M.Sc. jmejia@awi.de Computing Center of the Alfred Wegener Institute for Polar and Marine Research 29. November 2011 Contents 1. Fedora Commons Repository 2. Federico 3. Federico's
More informationStorage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan
Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality
More informationAMGA metadata catalogue system
AMGA metadata catalogue system Hurng-Chun Lee ACGrid School, Hanoi, Vietnam www.eu-egee.org EGEE and glite are registered trademarks Outline AMGA overview AMGA Background and Motivation for AMGA Interface,
More informationA Quick guide to AFS Simon Wilkinson school of Informatics University of Edinburgh
A Quick guide to AFS Simon Wilkinson school of Informatics University of Edinburgh simon@sxw.org.uk AFS Originally developed by Carnegie Mellon University as part of Project Andrew Commercialised by Transarc,
More informationBuilding a Digital Repository on a Shoestring Budget
Building a Digital Repository on a Shoestring Budget Christinger Tomer University of Pittsburgh! PALA September 30, 2014 A version this presentation is available at http://www.pitt.edu/~ctomer/shoestring/
More informationInternet2 Meeting September 2005
End User Agents: extending the "intelligence" to the edge in Distributed Systems Internet2 Meeting California Institute of Technology 1 OUTLINE (Monitoring Agents using a Large, Integrated s Architecture)
More informationRichard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010
Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu March 24, 2010 What is the feasibility of repository interoperability at the policy level? Can a preservation environment be assembled
More informationGrid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms
Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem
More informationTechnical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.
Technical Overview Technical Overview Standards based Architecture Scalable Secure Entirely Web Based Browser Independent Document Format independent LDAP integration Distributed Architecture Multiple
More informationManaging Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France
Managing Petabytes of data with irods Jean-Yves Nief CC-IN2P3 France Talk overview Data management context. Some data management goals: Storage virtualization. Virtualization of the data management policy.
More informationInstitutional Repository using DSpace. Yatrik Patel Scientist D (CS)
Institutional Repository using DSpace Yatrik Patel Scientist D (CS) yatrik@inflibnet.ac.in What is Institutional Repository? Institutional repositories [are]... digital collections capturing and preserving
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationThe Earth System Grid: A Visualisation Solution. Gary Strand
The Earth System Grid: A Visualisation Solution Gary Strand Introduction Acknowledgments PI s Ian Foster (ANL) Don Middleton (NCAR) Dean Williams (LLNL) ESG Development Team Veronika Nefedova (ANL) Ann
More informationThe NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures
The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures Samuel D. Gasster, Craig A. Lee, Brooks Davis, Matt Clark, Mike AuYeung, John R. Wilson Computer Systems
More informationThe Materials Data Facility
The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials
More informationThe OAIS Reference Model: current implementations
The OAIS Reference Model: current implementations Michael Day, UKOLN, University of Bath m.day@ukoln.ac.uk Chinese-European Workshop on Digital Preservation, Beijing, China, 14-16 July 2004 Presentation
More informationirods usage at CC-IN2P3 Jean-Yves Nief
irods usage at CC-IN2P3 Jean-Yves Nief Talk overview What is CC-IN2P3? Who is using irods? irods administration: Hardware setup. irods interaction with other services: Mass Storage System, backup system,
More informationMetadata Management in Grid Database Federation
Metadata Management in Grid Database Federation Lichun Zhu zhu19@uwindsor.ca Course: 60-510, Survey, Fall 2006 University of Windsor Instructor: Dr. Richard Frost Metadata management is a common issue
More informationA High-Level Distributed Execution Framework for Scientific Workflows
A High-Level Distributed Execution Framework for Scientific Workflows Jianwu Wang 1, Ilkay Altintas 1, Chad Berkley 2, Lucas Gilbert 1, Matthew B. Jones 2 1 San Diego Supercomputer Center, UCSD, U.S.A.
More informationA GridFTP Transport Driver for Globus XIO
A GridFTP Transport Driver for Globus XIO Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Joseph Link 5, and John Bresnahan 1,2,3 1 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne,
More informationSimplifying Collaboration in the Cloud
Simplifying Collaboration in the Cloud WOS and IRODS Data Grid Dave Fellinger dfellinger@ddn.com Innovating in Storage DDN Firsts: Streaming ingest from satellite with guaranteed bandwidth Continuous service
More informationGlobus GTK and Grid Services
Globus GTK and Grid Services Michael Rokitka SUNY@Buffalo CSE510B 9/2007 OGSA The Open Grid Services Architecture What are some key requirements of Grid computing? Interoperability: Critical due to nature
More informationLHC and LSST Use Cases
LHC and LSST Use Cases Depots Network 0 100 200 300 A B C Paul Sheldon & Alan Tackett Vanderbilt University LHC Data Movement and Placement n Model must evolve n Was: Hierarchical, strategic pre- placement
More informationIndex Introduction Setting up an account Searching and accessing Download Advanced features
ESGF Earth System Grid Federation Tutorial Index Introduction Setting up an account Searching and accessing Download Advanced features Index Introduction IT Challenges of Climate Change Research ESGF Introduction
More informationMetadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics
Metadata Catalogue Issues Daan Broeder Max-Planck Institute for Psycholinguistics Introduction Methods of registering resources Metadata Making metadata interoperable Exposing metadata Facilitating resource
More informationCMB-207-1I Citrix Desktop Virtualization Fast Track
Page1 CMB-207-1I Citrix Desktop Virtualization Fast Track This fast-paced course covers select content from training courses CXA-206: Citrix XenApp 6.5 Administration and CXD-202: Citrix XenDesktop 5 Administration
More informationNUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions
NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions Pradeep Sivakumar pradeep-sivakumar@northwestern.edu Contents What is XSEDE? Introduction Who uses XSEDE?
More informationHow to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center
How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center Jon Pollak The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) August 20,
More informationDSpace development from MIT's Digital Library Research Program
DSpace development from MIT's Digital Library Research Program MacKenzie Smith Associate Director for Technology MIT Libraries Digital Library Research Many, many hard problems Can t wait for perfect solutions
More informationIndiana University s Lustre WAN: The TeraGrid and Beyond
Indiana University s Lustre WAN: The TeraGrid and Beyond Stephen C. Simms Manager, Data Capacitor Project TeraGrid Site Lead, Indiana University ssimms@indiana.edu Lustre User Group Meeting April 17, 2009
More informationDataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences
DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences William K. Michener 1,2, Rebecca Koskela 1,2, Matthew B. Jones 2,3, Robert B. Cook 2,4, Mike Frame 2,5, Bruce Wilson
More informationDIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM
OMB No. 3137 0071, Exp. Date: 09/30/2015 DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM Introduction: IMLS is committed to expanding public access to IMLS-funded research, data and other digital products:
More informationDigital Preservation DMFUG 2017
Digital Preservation DMFUG 2017 1 The need, the goal, a tutorial In 2000, the University of California, Berkeley estimated that 93% of the world's yearly intellectual output is produced in digital form
More informationAn overview of the OAIS and Representation Information
An overview of the OAIS and Representation Information JORUM, DCC and JISC Forum Long-term Curation and Preservation of Learning Objects February 9 th 2006 University of Glasgow Manjula Patel UKOLN and
More informationFundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.
Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute
More informationBuilding on Existing Communities: the Virtual Astronomical Observatory (and NIST)
Building on Existing Communities: the Virtual Astronomical Observatory (and NIST) Robert Hanisch Space Telescope Science Institute Director, Virtual Astronomical Observatory Data in astronomy 2 ~70 major
More informationDelivering Data Management for Engineers on the Grid 1
Delivering Data Management for Engineers on the Grid 1 Jasmin Wason, Marc Molinari, Zhuoan Jiao, and Simon J. Cox School of Engineering Sciences, University of Southampton, UK {j.l.wason, m.molinari, z.jiao,
More informationScientific data management
Scientific data management Storage and data management components Application database Certificate Certificate Authorised users directory Certificate Certificate Researcher Certificate Policies Information
More informationImplementing a Data Publishing Service via DSpace. Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert
Implementing a Data Publishing Service via DSpace Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert May 20, 2009 Outline IUScholarWorks Massive Data Storage Service Example of the data publishing
More informationDistributed Repository for Biomedical Applications
Distributed Repository for Biomedical Applications L. Corradi, I. Porro, A. Schenone, M. Fato University of Genoa Dept. Computer Communication and System Sciences (DIST) BIOLAB Contact: ivan.porro@unige.it
More informationIndiana University Research Technology and the Research Data Alliance
Indiana University Research Technology and the Research Data Alliance Rob Quick Manager High Throughput Computing Operations Officer - OSG and SWAMP Board Member - RDA Organizational Assembly RDA Mission
More information