Integration of Cloud Storage with Data Grids

Size: px

Start display at page:

Download "Integration of Cloud Storage with Data Grids"

Paul Fowler
5 years ago
Views:

1 Integration of Cloud Storage with Data Grids M. WAN University of California, San Diego, CA, USA AND R. MOORE, AND A. RAJASEKAR, University of North Carolina, Chapel Hill, NC, USA The integrated Rule Oriented Data System (irods) is a data grid that organizes distributed data into a sharable collection, while enforcing management policies. The Amazon Simple Storage Service (S3) is an internet-based cloud storage service that allows users to store and retrieve data from anywhere, anytime on the web. Whereas the S3 provides robust storage it does not offer any other functionality. The irods system provides a rich set of authentication, authorization and auditing facilities, a means to associate descriptive metadata to data stored in irods through which users can discover and share data, and a means to maintain integrity and authenticity of data and recover from corruption based on replication strategies. The irods system provides a policy-based data management that allows each community of collaborating users to customize their complete data life-cycle management policies to meet their needs. We have integrated the S3 storage with irods such that users can have a rich set of functionality layered on top of the simple cloud storage offered by S3. The integration of S3 was accomplished using the "Compound Resource Framework" - one of the integration methods in irods. The compound resource framework provides an intermediate cache between the systems that allows irods to effectively manage the protocol mismatch between the put/get functionality exposed by S3 and the richer POSIX I/O of irods. Moreover, the framework performs the most efficient data transfer between the client and S3 by managing the bandwidth/latency mismatch between the client system and S3 host using the cache in an intelligent fashion. The integration of irods with S3 cloud storage system gives the user full-fledged data management functionality on top of the storage functionality offered by S3. Moreover, since irods can manage distributed resources, the integration allows one to integrate, discover and access data stored in multiple and diverse cloud and non-cloud storage systems. Categories and Subject Descriptors: H.3.2 [Information Storage] File Organization. General Terms: Design, Management Additional Key Words and Phrases: Storage model, cloud storage, data grids and rule-oriented systems ACM File Format: WAN, M., MOORE, R. AND RAJASEKAR, A Integration of Cloud Storage with Data Grids. Proc. Third International Conference on the Virtual Computing Initiative (October 2009), 10 pages. 1. INTRODUCTION The integrated Rule Oriented Data System (irods)[1,2,3] is a data grid that organizes distributed data into a sharable collection, while enforcing management policies. The Amazon Simple Storage Service (S3) [4] is an internet-based cloud storage service that allows users to store and retrieve data from anywhere, anytime on the web. Whereas the S3 provides robust storage it does not offer any other functionality. The irods system provides a rich set of authentication, authorization and auditing facilities, a means to associate descriptive metadata to data stored in irods through which users can discover and share data, and a means to maintain integrity and authenticity of data and recover from corruption based on replication strategies. The irods system provides a policybased data management that allows each community of collaborating users to customize their complete data life-cycle management to meet their needs. The integration of irods with S3 cloud storage system gives the user rich data management functionality on top of the storage functionality offered by S3. Moreover, since irods can manage distributed resources, the integration allows one to integrate, discover and access data stored in multiple and diverse cloud and non-cloud storage systems.

2 WAN, M., MOORE, R. AND RAJASEKAR, A Cloud computing [5,6] is a new model of computing on demand that is emerging as an alternate to in-house computing. Cloud computing is also a business model where virtualized computing resources are provided as a service, by a third-party provider, over the wide area network. Users buy time on these compute-resources as needed without worrying about installing, maintaining or upgrading local infrastructure. Cloud computing is very useful and highly efficient for meeting peak loads, short-term demands, changing user base, and fault tolerance. Amazon s Elastic Computing Cloud (EC2) [7], Google [8], Sun Cloud [9], IBM CloudBurst [10], Microsoft Azure[11] and GoGrid [12] are some examples of vendor-provided cloud computing. In association with cloud computing, development has also occurred in the area of provisioning demand-based storage, called cloud storage [13]. Cloud storage provides networkaccessible storage capacity for storing files in a remote, server site. Like cloud computing, cloud storage is a service provided by third party where users pay for the storage that they use and bandwidth they consume when importing and exporting data from the cloud storage system. Cloud storage is useful for off-line storage of files (disaster recovery and fault tolerance), caching of data new cloud computing resouces, meeting temporary storage spike needs, and for providing better and reliable web hosting services (load balancing). Amazon s Simple Storage Service (S3) [4], Nirvanix [14], Google Docs [15] and Rackspace Cloud [16] are examples of storage cloud services. The Grid is the software infrastructure that links distributed computational resources such as people, computers, sensors and data [17]. The Data Grid links distributed storage resources, from archival systems, to caches, to databases. The data within the Data Grid are mapped to a uniform logical name space to create global, persistent collections. It is possible to create and manage geographically distributed replicas of the digital entities that are registered into the collection. The naming convention for the digital entities can be global in scale, making it possible to use data grids to share access to data between continents. Data Grids enable sharing by providing network-wide user identification, third-party access control, and means to associate descriptive metadata with stored data enabling community users to discover and access relevant data. In addition to use as data sharing environments, data grids can also be used to support publication of data and preservation of data. Examples of data grids include the Storage Resource Broker [18], the integrated Rule-Oriented Data System [1,2,3] and the Globus Data Grid [19]. Examples of data grid usage can be found in [18]. We present an integration architecture where we combine the benefits of cloud storage systems and data grids to provide a user with rich distributed data management capabilities on top of the reliability, ease and cost-effectiveness provided by the cloud storage paradigms. We show and describe the feasibility of the approach by integrating the irods data grid with the Amazon S3 cloud environment. 2. AMAZON SIMPLE STORAGE SYSTEM (S3) Amazon S3 [4] is a pioneering cloud storage system that enables users to extend their storage capacities without much capital overlay. S3 provides internet-based access for storage at its site using a web-service interface. Through this interface users can store and retrieve any amount of data, at any time, from anywhere on the web providing scalability, reliability and portability. The service provided by S3 takes the onus off of small and large-scale organizations from incurring capital costs for installing large storage banks, and recurring costs maintaining and periodically upgrading them. Moreover, the organizations need not worry about meeting peak demand, recovery from

3 Integration of Cloud Storage with Data Grids faults, and provision of large bandwidth networks for data distribution. The shift to the per-use service-based model of S3 provides for an agile development cycle and for experimenting with new ideas without incurring large upfront costs. The key characteristics of S3 storage system can be seen as follows [4]: Improved agility allowing for changing strategies in usage of storage. Reduced cost due to reduction in capital and operational expenditures. Portability and access from any location increasing the types of usage models that one can achieve. Sharing of resources with other users who may be co-located. Improved reliability achieved through redundancy in storage offered by S3. Extensibility and scalability as the S3 model does not have any restrictions on storage provided. Hence, one can increase or decrease use of space as needed. Closer integration with cloud computing (Amazon EC2). Amazon s S3 provides a web service interface (SOAP and REST interfaces) for ingesting and accessing files from its cloud storage system. The functionality allows a user to read, write and delete files that are up to 5 Gigabytes in size. The files are stored in logically-named containers called buckets. The bucket names or keys are user defined and a user account can have up to 100 buckets. The system also allows one to list files in a bucket as well as query for system metadata about each file. One can store an unlimited number of files in any bucket by giving a unique name (key) for each file. There is no concept of a hierarchical directory structure in S3, but one is not precluded from using the / in the file name. In S3, buckets can be designated to be either in Europe or the United States and the files that are stored in a bucket gets stored in a storage system in that location. Internally, the files can be stored in any location within these areas and Amazon does not specify the redundancy it maintains for disaster recovery. The access of a file from S3 is independent of its storage location. S3 also provides access control mechanisms to ensure protection as well as to enable sharing among users. S3 has a pricing policy that differentiates between the storage costs and data transfer costs for transferring files in and out of S3. The cost of storage is proportional to the Gigabytes used and the rate becomes cheaper as the number of Terabytes stored increases. The cost of data transfer is higher for data ingestion compared to data access. Storage and data transfer costs between European and US sites also differ and data movement between sites is also charged. By configuring such a pricing policy, S3 makes it useful for small enterprises to leverage compute and storage capabilities immediately without building an extensive IT department and for large enterprises to beta test new projects and directions without additional capital equipment and IT staff time. S3 provides a web-service interface using both REST and SOAP protocols. S3 also provides other protocols such as the BitTorrent protocol for accessing files from multiple sites. The web services interface provides the following service end points (we provide simple explanations. A more informative documentation can be found at [4]): ListAllMyBuckets returns names of all buckets owned by the user CreateBucket Creates a new bucket DeleteBucket delete an empty bucket ListBucket List buckets which meets the given search criteria. GetBucketAccessControlPolicy Shows the ACLs for a bucket SetBucketAccessControlPolicy - Sets ACLs for a bucket for a given user

4 WAN, M., MOORE, R. AND RAJASEKAR, A GetBucketLoggingStatus - Shows the logging status of a Bucket SetBucketLoggingStatus Sets the logging status about what actions to log PutObjectInline Ingest an object which is part of a SOAP message PutObject - Ingest an object that is given as a DIME attachment CopyObject - Copy an object from one bucket to another possibly with a different name for the file. GetObject - Downloads a complete object GetObjectExtended Can download partial object meeting a given criteria DeleteObject Removes an object from a bucket (there is no trash can facility) GetObjectAccessControlPolicy - Shows the ACLs for an object SetObjectAccessControlPolicy - Sets ACLs for an object for a given user Several third-party groups have built user interfaces for accessing S3 capabilities by hiding the intricacies of the Amazon s web service interface. Several commercial enterprises also offer value added services for access to the services offered by S3. Reference [20] provides a list of tools available for storing files in S3. 2. INTEGRATED RULE-ORIENTED DATA SYSTEMS The integrated Rule-Oriented Data System [1,2,3] (irods) is peer-to-peer data grid middleware that provides a facility for collection-building, managing, querying, accessing, and preserving data in a distributed data grid framework. The irods system applies policy-based control when performing these functions. In brief, the irods system provides the following capabilities : Global persistent identifiers for naming digital objects. A unique identifier is used for each object stored in irods. Replicas and versions of the same object share the same global identifier but differ in replication and version metadata. Support for metadata to identify system-level physical properties of the stored data object. The properties that are stored include physical resource location, path names (or canned SQL queries in case of database resources), owner, creation time, modification time, expiration times, file types, access controls, file size, location of replicas, aggregation in a container, etc. Support for descriptive metadata to enable discovery through simple query mechanisms. The irods supports metadata in terms of attribute-value-unit triplets. Any number of such associations can be added for each digital object. Standard access mechanisms. Interfaces include Web browsers, Unix shell commands, Windows browsers, Python load libraries, Java, C library calls, Fuse-based file interface, WebDav, Kepler and Taverna workflow, etc. Storage repository abstraction. Files may be stored in multiple types of storage systems including tape systems, disk systems, databases and now cloud storage. Inter-realm authentication system for secure access to remote storage systems including secure passwords and certificate-based authentication such as GSI. Support for replication and synchronization of files between resource sites. Support for caching copies of files onto a local storage system and support for accessing files in an archive using compound resource methodology. This includes the concept of multiple replicas of an object with distinct usage models. Archives are used as safe copies and caches are used for immediate access. Support for physically aggregating files into tar-files to optimize management of large numbers of small files.

5 Integration of Cloud Storage with Data Grids Access controls and audit trails to control and track data usage. Support for execution of remote operations for data sub-setting, metadata extraction, indexing, remote data movement, etc using micro-services and rules. Support for rich I/O models for file ingestion and access including in- situ registration of files into the system, inline transfer of small files, and parallel transfer for large files. Support for federation of data grids. Two independently managed persistent archives can establish a defined level of trust for the exchange of materials and access in one or both directions. This concept is very useful for developing a full-fledged preservation environment with dark and light archives. The irods data grid system consists of several components. It has a metadata catalog server, called the icat server, which provides the metadata and abstraction services for the whole data grid. There can be multiple resource servers that provide access to storage resources. A resource server (ires) can provide access to more than one storage resource. The system can support any number of clients at a time. A client can connect to any server on the grid and request access to digital objects from the system. The request is parsed using the contextual and system information stored in the icat catalog, and a physical object is identified and transferred to the client. The request can be in terms of logical object names, or a conditional query based on descriptive and system metadata attributes. irods is a peer-to-peer server system. Hence, requests can be made to any server, which in turn acts (brokers) on behalf of the client for transferring the file. The final file transfer takes the shortest path in terms of number of hops. An important aspect of irods is its built-in rule framework. As part of each resource server, a distributed rule engine is implemented that provides extensibility and customizability by encoding server-side operations (including the main access APIs) into sequences of micro-services. The sequence of micro-services is controlled by userdefined and/or administrator-defined Event-Condition-Action rules similar to those found in active databases. The rules can be viewed as defining pipelines and/or workflows. An ingestion or access process can be encoded as a rule to provide a customized functionality. Rules can also be defined by users and executed interactively. Hence, changes to a particular process or policy can be easily constructed by the user and tested and deployed without the aid of system and application developers. The user can also define conditions when a rule gets triggered thus controlling application of different rules (or processing pipelines) based on current events and operating conditions. The programming of rules in irods can be viewed as lego-block type programming. The building blocks for the irods rules are micro-services - small, well-defined procedures/functions that perform a certain task. For example, one may encode a rule that when accessing a data object from a collection C, additional authorization checks need to be made. These authorization checks can be encoded as a set of micro-services with different triggers that can fire based on current operating conditions. In this way, one can control access to sensitive data based on rules and can escalate or reduce authorization levels dynamically as the situation warrants. The irods rule engine design builds upon the application of theories and concepts from a wide range of well-known paradigms from fields such as active databases, transactional systems, logic programming, business rule systems, constraint-management systems, workflows, service-oriented architecture and program verification. Apart from ires servers and an icat server, irods also has two other servers: isec for scheduling and executing queued rules, and ixms for

WAN, M., MOORE, R. AND RAJASEKAR, A providing a message-passing framework between micro-services. Figure 1 shows the various components of the irods system as well as some of its user interfaces.

6 WAN, M., MOORE, R. AND RAJASEKAR, A providing a message-passing framework between micro-services. Figure 1 shows the various components of the irods system as well as some of its user interfaces. Figure 1 irods Architecture The irods system is in production use in multiple projects including the US National Archives Transcontinental Persistent Archive Prototype (TPAP) [21], the NSF Science of Learning Centers [22], the Australian Research Collaboration Services [23] and the SHAMAN project in UK [24]. 3. INTEGRATING irods AND AMAZON S S3 Amazon s S3 provides a powerful and easy to deploy internet-based file storage system. But it does not provide any other capabilities that will make it user-friendly and easy to integrate with existing storage systems. It lacks many of the features that will enable it to be used as a long-term, highly available and sharable resource. As it is, it is good for parking files for projects, using it as a backup web site with public access, and as a storage system that is integrated with Amazon s EC2 compute cloud. Some of the capabilities that can be value-added to S3 to make it more viable are: Full-fledged File System Interface: S3 does not expose any hierarchical directory/folder structure that we are familiar with. For each user it provides a limited number of buckets and files are placed in it with unique names. S3 also does not have full-fledged ACLs which can be used for controlling access to user groups and the public. Also, it is not user friendly as the user name space for user accounts are given system-defined strings. Also, the concept of public/anonymous user is not supported. At the API level, its protocol does not support the POSIX API which is widely used in block-level programmatic interfaces. S3 also does not support symbolic links where a file can be accessed from more than one file path definition. Data Grid Services: S3 is not suitable for integrating and federating with other resources. Many tools are available to use S3 as a backup resource, but they don t provide a means to use S3 in a federation of resources. Such a federation will allow multiple resources to

7 Integration of Cloud Storage with Data Grids be shared including other cloud storage services. S3 does not provide any tools for keeping track of replicated files and versioned files. Also, data grids need data to be transferred at high speed and in parallel. They also need to deal with data sizes larger than the 5GB (currently) limit within S3. Digital Library Services: S3 does not have metadata support. Descriptive and system metadata are needed for keeping additional information about an object (such as engineering, calibration and positional information of a sky image taken by an telescope). This is not only useful for processing the files but also for discovering them later based on a multi-dimensional search. Metadata schemas exists for multiple domains such as the FITS metadata for astronomical data, DICOM metadata for medical images, Dublin Core elements for electronic documents and Darwin Core for ecological data. Supporting such schemas for managing the contextual information for the data and enabling discovery is an important aspect of digital libraries. Persistent Archive Services: Even though S3 provides a robust storage platform, with long-term viability, it does not provide the necessary tools for maintaining a persistent archive. These include keeping track of integrity of the digital objects, including chain of custody; packaging information (data and metadata) into bundles and keeping them together for ease of access and archiving purposes; and consistency checking for validation of bit-wise integrity. Some of these capabilities are provided by third-party services, but many of these are still not sufficient to provide coherent and robust data sharing, or implement a digital library or persistent archiving environment on top of S3. By integrating irods on top of S3, these capabilities are realized, making S3 an attractive option for network-based storage. We have designed and implemented such an integrated system and have shown that it is viable and provides value-added services for S3. For our integration, we used the libs3 [25] library provided by a third party software developer. The software is available as source and as binaries, under a GPL license, and can be downloaded from the Amazon s S3 web site. We opted to use the libs3 package because it is in the language of preference (C language) and provided a simple means of integrating S3 with irods. The developer of libs3 had the design goal of implementing a C API for S3 access that provides a simple and straightforward API for accessing all of S3's functionality using sequentialized blocking requests, does not require the application developer to know anything about the internal S3 interface or about WSDL, HTTP, XML and SSL, can be used in a multi-threaded environment and can be used from applications that can connect multiple times simultaneously to S3. These design goals eminently suited our purpose as it was well-aligned with those of irods design goals multithreading, and sequential blocked access to files with get/put functionality. There was one major mismatch between the access functionality provided by S3 (and libs3) and irods. irods provides access similar to that of POSIX I/O including file open, seek, and close functionality not supported by S3. Also, irods allows users to access files in blocks; even though S3 has similar functionality using its GetObjectExtended API, using it to access small buffer sizes would make the system very slow. In order to manage this gap in functionality, in our integration we used the "Compound Resource Framework" - one of the integration methods in irods. The compound resource framework allows one to group multiple resources into a single resource pool. Each of the resources in the pool has a designated resource type

8 WAN, M., MOORE, R. AND RAJASEKAR, A such as Archive, Permanent, Cache and Volatile. An archive resource is used as a deep resource (such as tapes), possibly with high latency, and is used mainly for archiving files. One is disallowed from performing buffer level operations on such files and access is mainly through whole file retrieval. A cache resource on the other hand is considered to be a low-latency system (disk-based) with high bandwidth and support for parallel I/O. Objects in a cache resource can be purged by the irods administrator to recover space. A permanent resource is similar to a cache but is not amenable to purging, and a volatile resource is a temporary resource that can purge data without the knowledge of the irods system. Synchronization and back-up functionality between cache files and archived/permanent files is supported under the compound resource framework. One can have multiple resources of the same type in a compound resource group. In this framework, whenever a user ingests a file into the archive resource, it is not directly ingested into that resource. Instead, the file is automatically diverted into a cache resource. Periodically, or once a full file has been transferred, the file in the cache is synchronized into the archive by making a copy in the archive resource. On access of a file within the archive, it is first staged onto the cache resource, and then provided to the user. An advantage of this staging is that the bandwidth mismatch between the archive resource to the irods server and that of the irods server to the client host is automatically smoothed in such a way that the load and interactions with the archive resource is kept at a minimum. For example, if a user is accessing files in 10- kilobyte blocks, the irods system brings the whole file from the S3 and caches it in the associated disk resource (say the file is 1 GB in length). Independently of S3, irods performs the small access operations on the staged file. The S3 does not see the load of getting small buffers and the user sees a fast response to her requests. The integration of S3 and irods using the libs3 library allowed us to apply all the functionality of irods on files stored in S3. These include a rich access control paradigm, applying ingestion and access policies, replication, versioning, copying and moving and other data management features provided by irods. More importantly, users can associate metadata with files stored in S3 and use the query interface provided by irods to discover and access files using domain-centric metadata. The integration of the irods data grid on top of cloud storage makes it possible to build a shared collection that spans institutional repositories and cloud storage. A research project can establish a policy for when data will be migrated from the institutional repository into the cloud storage, implement criteria for minimizing data flow out of the cloud through management of the cache, and enable discovery of data stored in the cloud without having to access the cloud. This is accomplished through policy-based control of all operations that access the cloud storage. An irods policy is expressed as rule of the form: Event Condition Action-chain Recovery-chain Events are defined for each possible interaction with the cloud storage. Examples include: putting a file into the cloud storage getting a file from the cloud storage replicating a file into the cloud storage moving a file between storage systems copying a file that is in cloud storage aggregating files into a container before storage in the cloud

9 Integration of Cloud Storage with Data Grids generating an audit trail of all interactions with the could storage Given an event, a condition can then be specified that must be satisfied before the associated action-chain is executed. Examples of conditions include: test for whether the user's group has permission to use the cloud storage test for which cloud resource is being accessed test for whether the cache in front of the cloud resource is full test for whether the files in the cloud resource have reached a data retention time limit test for whether the data stored in the cloud has reached an allowable maximum test for whether an integrity check on files in the cloud was done within a desired time period Given satisfactory evaluation of the condition, an appropriate action-chain is then executed. Examples include: (put a file, valid user access, but quota exceeded) for the action chain, store the file in an alternate resource (put a file, but cache is full) for the action chain, identify the least recently used files and purge from cache (put a file, but size is below a minimum) aggregate the file into a container on a separate system before loading into cache The policies that control interaction with the cloud storage system can be quite sophisticated, and invoke hierarchical rules to pre-process the file before storage or on access. Through use of the cache in front of the cloud storage, pre- and post-processing of the file is straightforward. 4. STATUS AND CONCLUSION The integration of irods with cloud storage has been implemented and is part of the release of irods Version 2.2. Any user who has an account in an irods system that uses an S3 resource can use it for file storage (provided the user has appropriate resourcelevel access permission). We have a system running in our testbed at UCSD that uses S3 as a resource. Some users of irods (such as the Ocean Observatories Initiative [26]) have shown interest in using S3 through the irods framework. The system is undergoing testing and is seen to be robust. Because of the success with Amazon S3 integration, we are planning to interface other cloud storage systems such as those offered by Google and Microsoft. The main advantage we see with this integration is the ability to perform large-scale data operations (using micro-services) on files stored in S3. In the near future, we propose to integrate Amazon s Elastic Cloud Computing [7] with irods rule-based execution environment. With such an integrated system, one can perform operations on large file collections (functions such as format conversion, image processing, integrity validation, data mining, etc) by storing files in S3 and launching the jobs in EC2. ACKNOWLEDGMENT The research results in this paper were funded by the NARA supplement to NSF SCI , Cyberinfrastructure; From Vision to Reality - Transcontinental Persistent Archive Prototype (TPAP) ( ) and by the NSF Office of Cyberinfrastructure OCI grant, NARA Transcontinental Persistent Archive Prototype, ( ). The irods technology development has been funded by NSF ITR , Constraint-based Knowledge Systems for Grids, Digital Libraries, and Persistent

10 WAN, M., MOORE, R. AND RAJASEKAR, A Archives ( ) and NSF SDCI , "SDCI Data Improvement: Data Grids for Community Driven Applications ( ). REFERENCES 1. Rajasekar, M. Wan, M. Moore, and W. Schroeder, A Prototype Rule-based Distributed Data Management System, HPDC workshop on Next Generation Distributed Data Management, Paris, France, irods: integrated Rule Oriented Data System, 3. R.W. Moore and A. Rajasekar, Rule-Based Distributed Data Management, Grid 2007: IEEE/ACM International Conference on Grid Computing, Amazon Simple Storage Service (Amazon S3), 5. Weiss, Computing in the clouds, networker, v.11 n.4, p.16-25, December R. Buyya, Yeo, C., Venugopal, S., Broberg, J., Brandic, I., Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems, v.25 n.6, p , June, Amazon elastic compute cloud (EC2) Google app engine Sun network.com (Sun grid) IBM Cloud Computing Microsoft azure GoGrid Cloud Hosting J. Broberg, Buyya, R., and Tari, Z., Creating a Cloud Storage Mashup for High Performance, Low Cost Content Delivery. Proc. Service-Oriented Computing --- ICSOC 2008 Workshops, LNCS pp Nirvanix storage delivery network (SDN) GoogleDocs Rackspace Managed Hosting Foster, I., and Kesselman, C., (1999) The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann. 18. Rajasekar, A., Wan, M., Moore, R., Schroeder, W., Kremenek, G., Jagatheesan, A., Cowart, C., Zhu, B., Chen, S.-Y., and Olschanowsky, R Storage Resource Broker - Managing Distributed Data in a Grid, Computer Society of India Journal, special issue on SAN, Chervenak A., Foster, I., Kesselman, C., Salisbury, C., and Tuecke, S The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data Sets, Journal of Network and Computer Applications: Special Issue on Network-Based Storage Services, vol. 23, no. 3, p , July A List of Amazon S3 Backup Tools NARA Transcontinental Persistent Archive Prototype, Australian Research Collaboration Service; Davis: A Generic Interface for SRB and irods, Science of Learning Centers, SHAMAN: Sustaining Heritage Access through Multivalent ArchiviNg libs3: A C Library API for Amazon S Ocean Observatories Initiative,

The International Journal of Digital Curation Issue 1, Volume

The International Journal of Digital Curation Issue 1, Volume Towards a Theory of Digital Preservation 63 Towards a Theory of Digital Preservation Reagan Moore, San Diego Supercomputer Center June 2008 Abstract A preservation environment manages communication from