Knowledge-based Grids

Similar documents
DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

A Simple Mass Storage System for the SRB Data Grid

Mitigating Risk of Data Loss in Preservation Environments

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Collection-Based Persistent Digital Archives - Part 1

Introduction to The Storage Resource Broker

Building the Archives of the Future: Self-Describing Records

DSpace Fedora. Eprints Greenstone. Handle System

SRB Logical Structure

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Policy Based Distributed Data Management Systems

Distributed Data Management with Storage Resource Broker in the UK

IRODS: the Integrated Rule- Oriented Data-Management System

The International Journal of Digital Curation Issue 1, Volume

MetaData Management Control of Distributed Digital Objects using irods. Venkata Raviteja Vutukuri

Data Grids, Digital Libraries, and Persistent Archives

Transcontinental Persistent Archive Prototype

Implementing Trusted Digital Repositories

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Storage Challenges at the San Diego Supercomputer Center

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Knowledge Discovery Services and Tools on Grids

Introduction to Grid Computing

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures

Database Assessment for PDMS

Distributing BaBar Data using the Storage Resource Broker (SRB)

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

A Metadata Catalog Service for Data Intensive Applications

DITA Keys Analyzer User Guide. Copyright Maxprograms

Oracle Database 12c: Use XML DB

Introduction to Grid Technology

VIRTUAL OBSERVATORY TECHNOLOGIES

Metadata and Encoding Standards for Digital Initiatives: An Introduction

70-532: Developing Microsoft Azure Solutions

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Electronic Records Archives: Philadelphia Federal Executive Board

Policy-Driven Repository Interoperability: Enabling Integration Patterns for irods and Fedora

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

XViews: XML views of relational schemas

Metadata Management in Grid Database Federation

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Planning the Future with Planets The Planets Interoperability Framework. Presented by Ross King Austrian Research Centers GmbH ARC

Web Services for Visualization

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Hyperion Interactive Reporting Reports & Dashboards Essentials

Development of an Ontology-Based Portal for Digital Archive Services

Openness, Growth, Evolution, and Closure in Archival Information Systems

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

T-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH

Projects at SDSC. Chaitan Baru Richard Marciano Data Intensive Computing Group. San Diego Supercomputer Center

Enabling Interaction and Quality in a Distributed Data DRIS

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Managing Large Distributed Data Sets using the Storage Resource Broker

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

The Materials Data Facility

DataCutter Joel Saltz Alan Sussman Tahsin Kurc University of Maryland, College Park and Johns Hopkins Medical Institutions

Fedora Relationships and Information Network Overlays. CS 431 April 19, 2006 Carl Lagoze Cornell University

Implementation of Geospatial Product Virtualization in Grid Environment

The NCAR Community Data Portal

Grid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology

Product Documentation. ER/Studio Portal. User Guide. Version Published February 21, 2012

Streamlining CASTOR to manage the LHC data torrent

High Performance Computing Course Notes Grid Computing I

NSDL Technical Systems Transition. Overview of technical systems transition September 8, 2011

Grid Computing Systems: A Survey and Taxonomy

NARA s Electronic Records Archives Program

70-532: Developing Microsoft Azure Solutions

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE

The Earth System Grid: A Visualisation Solution. Gary Strand

Database Setup in IRI Workbench 1

Detailed Course Modules for Oracle BI Publisher Online Training:

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Fedora. CS 431 April 17, 2006 Carl Lagoze Cornell University. Acknowledgements: Sandy Payette (Cornell)

Technology for the Virtual Observatory. The Virtual Observatory. Toward a new astronomy. Toward a new astronomy

Page 1. Oracle9i OLAP. Agenda. Mary Rehus Sales Consultant Patrick Larkin Vice President, Oracle Consulting. Oracle Corporation. Business Intelligence

Implementing the Army Net Centric Data Strategy in a Service Oriented Environment

archiving with the IBM CommonStore solution

COMP9321 Web Application Engineering

TABLE OF CONTENTS 1. INTRODUCTION DEFINITIONS Error! Bookmark not defined REASON FOR ISSUE 2 3. RELATED DOCUMENTS 2 4.

Search Framework for a Large Digital Records Archive DLF SPRING 2007 April 23-25, 25, 2007 Dyung Le & Quyen Nguyen ERA Systems Engineering National Ar

COMP9321 Web Application Engineering

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Computational Web Portals. Tomasz Haupt Mississippi State University

Copyright 2008, Paul Conway.

Table of Contents Chapter 1 - Introduction Chapter 2 - Designing XML Data and Applications Chapter 3 - Designing and Managing XML Storage Objects

Grid Middleware and Globus Toolkit Architecture

Delivering Data Management for Engineers on the Grid 1

Constraint-based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives: Final Report May 2007

What s new. James De Clercq (RealDolmen) Timothy Dewin (Veeam Software)

HPSS Treefrog Summary MARCH 1, 2018

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Transcription:

Knowledge-based Grids Reagan Moore San Diego Supercomputer Center (http://www.npaci.edu/dice/)

Data Intensive Computing Environment Chaitan Baru Walter Crescenzi Amarnath Gupta Bertram Ludaescher Richard Marciano Xufei Qian Arcot Rajasekar Michael Wan Ilya Zaslavsky Charlie Cowart Sheau Yen Chen George Kremenek Bing Zhu Mediation of information Web site wrapping Rule-based mediation Self-instantiating archives Knowledge management Knowledge mining Collection management Data handling GIS systems Browsers Digital Embryo project Information Power Grid / 2MASS Particle Physics Data Grid

Technologies for Managing Storage in the Web Grids Data Grids Digital Libraries Persistent Archives Knowledge-based Grids

Storage Management Logical representations for storage systems Store bits of data Logical representations for information repositories Logical representations for collections (Information about digital objects) Store attributes about data Logical representations for knowledge repositories Store relationships between attributes

Grid Services Grids provide access to distributed resources: computing, storage, sensors, display devices, Middleware services Remote job execution Remote file access Authentication across administration domains Single sign-on Examples - Globus, Legion

Globus Layered Grid Architecture (By Analogy to Internet Architecture) Application Specialized services : user- or appln-specific distributed services Managing multiple resources : ubiquitous infrastructure services Sharing single resources : negotiating access, controlling use Talking to things : communication (Internet protocols) & security Controlling things locally : Access to, & control of, resources User Collective Resource Connectivity Fabric Application Transport Internet Link Internet Protocol Architecture

Data Grid Supports management of data objects across a distributed set of storage resources Extends Grids to include data management. Challenges are: Object discovery Managing context for objects (organization into collections) Managing relationships between objects (concept spaces) Integration of collections into data grids

Collection-based Storage Access millions to billions of data objects within a collection Astronomy sky surveys - 2-Micron All Sky Survey 5 million images, 10 TBs of data Access requirements Replicate between two HPSS archives Provide access to individual images Provide access to thousands of images

Linking Collections with Data Grids Logical collection - elements - attributes Available Transforms Derived data process metadata Derived Data metadata Export elements & attributes Transforms On elements Derived data products Grid Container -Logical name -Container metadata -Element attributes -(Data model) -Elements Mapping of logical containers to physical files Import into existing or new logical collection Grid metadata catalog Grid replica catalog Logical collection

Digital Libraries Provide services to discover, access, manipulate information organized in collections Discover Digital library standards for provenance metadata - Dublin Core Information catalog characterization - MCAT Schema composition for extensible discipline attributes - EMCAT Discovery mechanisms based on XML syntax - XQuery Access Metadata delivery mechanisms for information content using XML - SDLIP Manipulate Extensions to XQuery - for manipulation of scientific data

Information about collections referenced Referenced referenced items items && Items collections collections & Collections Portals & Portals & Clients Portals & Clients Clients NSDL NSDL Collections NSDL Collections Collections Virtual Collections & Mediators User Interfaces Delivery Presentation Aggregation - Channels Core NSDL Bus Meta-data delivery Data delivery Query Global Ids Security Network Core Services: Collectionmetadata Building Core normalizing Collection- Services metadata Building harvesting Services Collection persistent storage Building Usage Enhancement NSDL NSDL Services Other Services NSDL Services Metadata & data access-based services Core CI Services: annotation CI Services query CI transform Services topic-map CI Services registry personalization CI Services discussion visualization...

Persistent Archives Provide interoperability mechanisms to migrate collections from old technologies to new technologies Requires ability to migrate across: Media Storage systems Collections Information markup language standards

ERA: Archival Components Concept Tapes Grid Security Infrastructure Storage Resource Broker/Extensible Meta-data CATalog ERA Concept model Accessioning Workbench Archival Repository Reference Workbench Accession Collection Query Disks Verify Collection Collection Rebuild Internet Wrap & Containerize Describe Metadata Present Records Schedules Mediation of Information using XML Order Archival Research Catalog Fulfillment System

Evolution of Grids File-based access Digital objects identified by path name Collection-based access Digital objects identified by collection attributes Knowledge-based access Digital objects identified by domain concepts Map from concepts used by a discipline to collection attributes to local file name

Knowledge Based Grid Ingestion Management Access Knowledge Relationships Between Concepts XTM DTD Knowledge Repository for Rules Rules - KQL Knowledge or Topic-Based Query / Browse Information Attributes Semantics (Topic Maps / Model-based Access) XML DTD Information Repository SDLIP Attribute- based Query (Data Handling System - SRB) Data Fields Containers Folders MCAT/HDF Storage (Replicas, Persistent IDs) Grids Feature-based Query

Common Web Storage Management Hierarchy Knowledge-based Grids Concept based access Data Grid Access to data across administration domains Digital Library Services applied to information Data Collection Manage information Data handling Manage access to storage systems Persistent Archives Manage evolution of software and hardware storage systems

Portals and Workbenches Data Grids Knowledge & Resource Management Concept space Grid Security Caching Replication Backup Scheduling Metadata View Information Discovery Data View Catalog Mediator Metadata delivery Catalog Analysis Data Discovery Bulk Data Analysis Standard APIs and Protocols Data mediator Data Delivery Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Compute Resources Derived Collections Catalogs Data Archives

Collection Interactions Provide a logical representation for a collection (schema, table structure) Register a collection as an object within a grid Dynamically generate SQL commands for attribute-value based discovery Export data elements form collection into containers Manipulate containers (replicate, cache, transport)

SDSC Storage Resource Broker & Meta-data Catalog Application Resource, User MCAT User Defined Dublin Core Application Meta-data C, C++, Linux I/O Archives HPSS, ADSM, UniTree, DMF Unix Shell HRM Java, NT Browsers SRB File Systems Unix, NT, Mac OSX Prolog Predicate Web Databases DB2, Oracle, Postgres Third-party copy Remote Proxies DataCutter

Table Access Interface Facility to access tabular data using SRB API View SQL queries as Locators (Path Names or URI) Apply open, close, read, write operations Provide for very general queries to specific queries any query on a database to soft queries to hard-coded queries Access Result Table as a Stream Provide Server-side operations to present results Forms, HTML, XML, Data Wetting, Charting, Visualization Multi-modal Ingestion SQL ingestion Packed Ingestion - useful in data movement and replication Directly ingest data marked by HTML, XML,...

Shadow Objects A feature for registering partial physical locations Partial path in a file system allows one to access files under a directory Partial SQL query allows for modification at access time. Registering a null query allows for any query to be allowed in a database

Server-side Presentation Markup data before sending to client Generic markup - HTML, XML Specific markup - Template Template Language Allows data element variables Control structure - if-then-else, for, nested Object-in-object User specifies mark up at query time Can be used for other data streams also!

Information Management Projects Digital Libraries NSF Digital Library Initiative, Phase II - UCSB, Stanford Digital Embryo digital library - GMU NPACI Digital Sky - Caltech 2MASS sky survey CDL - AMICO NSF NSDL - UCAR / DLESE Grid Environments NASA Information Power Grid - NASA Ames DOE Data Visualization Corridor - LLNL DOE Particle Physics Data Grid - Stanford, Caltech NSF Grid Physics Network - U Fl Persistent Archives NARA Persistent Archive NHPRC - Scalable archives

Further Information http://www.npaci.edu/dice