The AMGA Metadata Service

Similar documents
AMGA metadata catalogue system

Comparative evaluation of software tools accessing relational databases from a (real) grid environments

AMGA tutorial. Enabling Grids for E-sciencE

glite Grid Services Overview

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Understanding StoRM: from introduction to internals

AMGA Metadata catalogue service for managing mass data of high-energy physics

AMGA metadata catalogue and high level API

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

A Simplified Access to Grid Resources for Virtual Research Communities

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

MASTERS COURSE IN FULL STACK WEB APPLICATION DEVELOPMENT W W W. W E B S T A C K A C A D E M Y. C O M

DIRAC File Replica and Metadata Catalog

EUROPEAN MIDDLEWARE INITIATIVE

A Simple Mass Storage System for the SRB Data Grid

A Distributed Media Service System Based on Globus Data-Management Technologies1

Prototype DIRAC portal for EISCAT data Short instruction

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

LCG-2 and glite Architecture and components

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Provide Real-Time Data To Financial Applications

Scientific data management

IBM Tivoli Identity Manager V5.1 Fundamentals

Ganga The Job Submission Tool. WeiLong Ueng

Federated Authentication with Web Services Clients

R-GMA (Relational Grid Monitoring Architecture) for monitoring applications

Failover procedure for Grid core services

DIRAC data management: consistency, integrity and coherence of data

BEAWebLogic. Portal. Overview

DIRAC pilot framework and the DIRAC Workload Management System

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

IBM. Planning and Installation. IBM Workload Scheduler. Version 9 Release 4

STATUS UPDATE ON THE INTEGRATION OF SEE-GRID INTO G- SDAM AND FURTHER IMPLEMENTATION SPECIFIC TOPICS

WHEN the Large Hadron Collider (LHC) begins operation

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

DATABASE SYSTEMS. Introduction to MySQL. Database System Course, 2016

TABLE OF CONTENTS 1. INTRODUCTION DEFINITIONS Error! Bookmark not defined REASON FOR ISSUE 2 3. RELATED DOCUMENTS 2 4.

Web Programming Paper Solution (Chapter wise)

Distributed Multitiered Application

Mitigating Risk of Data Loss in Preservation Environments

Tools to Develop New Linux Applications

X100 ARCHITECTURE REFERENCES:

Interconnect EGEE and CNGRID e-infrastructures

IBM. Planning and Installation. IBM Tivoli Workload Scheduler. Version 9 Release 1 SC

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Moving LDAP Writes to Web Services

CYAN SECURE WEB Installing on Windows

(C) Global Journal of Engineering Science and Research Management

MarkLogic Server. MarkLogic Server on Microsoft Azure Guide. MarkLogic 9 January, 2018

SLCS and VASH Service Interoperability of Shibboleth and glite

Developing Applications with Java EE 6 on WebLogic Server 12c

Multimedia Database Architecture!

WebSphere MQ Update. Paul Dennis WMQ Development 2007 IBM Corporation

Product Documentation. ER/Studio Portal. User Guide. Version Published February 21, 2012

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Agent-Enabling Transformation of E-Commerce Portals with Web Services

NeuroLOG WP1 Sharing Data & Metadata

Enterprise SOA Experience Workshop. Module 8: Operating an enterprise SOA Landscape

Web Applications. Software Engineering 2017 Alessio Gambi - Saarland University

Introduction to Grid Computing

CERTIFICATE IN WEB PROGRAMMING

1Z0-430

Extended Search Administration

Setup Desktop Grids and Bridges. Tutorial. Robert Lovas, MTA SZTAKI

Andrea Sciabà CERN, Switzerland

Database Assessment for PDMS

Azure Development Course

Vlad Vinogradsky

WP3 Final Activity Report

IBM Rational Developer for System z Version 7.5

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

Requirements for data catalogues within facilities

The LGI Pilot job portal. EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers

SAS Contextual Analysis 14.3: Administrator s Guide

Consorzio COMETA - Progetto PI2S2. DMS API glite. Salvatore Scifo Consorzio Cometa (PI2S2) - Catania. Corso introduttivo al Grid Computing

Grid Infrastructure For Collaborative High Performance Scientific Computing

Cisco TelePresence Management Suite Provisioning Extension

Attix5 Pro Storage Platform Console

SZDG, ecom4com technology, EDGeS-EDGI in large P. Kacsuk MTA SZTAKI

Ansible Tower Quick Setup Guide

Distributed Computing Environment (DCE)

Network Programmability with Cisco Application Centric Infrastructure

OpenIAM Identity and Access Manager Technical Architecture Overview

ER/Studio Enterprise Portal User Guide

DSpace Fedora. Eprints Greenstone. Handle System

Welcome to the New Era of Cloud Computing

Jitterbit is comprised of two components: Jitterbit Integration Environment

Web Services in Cincom VisualWorks. WHITE PAPER Cincom In-depth Analysis and Review

A Tool for Conditions Tag Management in ATLAS

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Grant agreement no N4U. NeuGRID for you: expansion of NeuGRID services and outreach to new user communities

Database Developers Forum APEX

DATABASE SYSTEMS. Introduction to MySQL. Database System Course, 2016

Developing Microsoft Azure Solutions

ER/Studio Enterprise Portal User Guide

DOT NET Syllabus (6 Months)

Computational Web Portals. Tomasz Haupt Mississippi State University

CouchDB-based system for data management in a Grid environment Implementation and Experience

Oracle FLEXCUBE Investor Servicing BIP Report Development Guide Release 12.0 April 2012 Oracle Part Number E

Transcription:

The AMGA Metadata Service Antonio Calanducci National Institute of Nuclear Physics (INFN) - Catania EGEE Grid tutorial for Users and Sysadmin Barcelona, 14th-18th April 2008 www.eu-egee.org EGEE and glite are registered trademarks

Metadata services background and motivation Contents Architecture and features of AMGA Grid DB Access with AMGA Use cases

Why Grid needs Metadata?

Why Grid needs Metadata? Grids allow to save millions of files spread over several storage sites.

Why Grid needs Metadata? Grids allow to save millions of files spread over several storage sites. Users and applications need an efficient mechanism to describe files to locate files based on their contents

Why Grid needs Metadata? Grids allow to save millions of files spread over several storage sites. Users and applications need an efficient mechanism to describe files to locate files based on their contents This is achieved by associating descriptive attributes to files Metadata is data about data answering user queries against the associated information

Basic Metadata Concept 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair Type The type (int, float, string, ) 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair Type The type (int, float, string, ) Name/Key The name of the attribute 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair Type The type (int, float, string, ) Name/Key The name of the attribute Value - Value of an entry's attribute 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair Type The type (int, float, string, ) Name/Key The name of the attribute Value - Value of an entry's attribute Schema A set of attributes 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair Type The type (int, float, string, ) Name/Key The name of the attribute Value - Value of an entry's attribute Schema A set of attributes Collection A set of entries associated with a schema 4

Basic Metadata Concept Entries Representation of real world entities which we are attaching metadata to for describing them Attribute key/value pair Type The type (int, float, string, ) Name/Key The name of the attribute Value - Value of an entry's attribute Schema A set of attributes Collection A set of entries associated with a schema Metadata - List of attributes (including their values) associated with entries 4

Example: Movie Trailers

Example: Movie Trailers Movie trailers files (entries) saved on Grid Storage Elements and registered into File Catalogue

Example: Movie Trailers Enabling Grids for E-sciencE Movie trailers files (entries) saved on Grid Storage Elements and registered into File Catalogue We want to add metadata to describe movie content.

Example: Movie Trailers Enabling Grids for E-sciencE Movie trailers files (entries) saved on Grid Storage Elements and registered into File Catalogue We want to add metadata to describe movie content. A possible schema: Title -- varchar Runtime -- int Cast -- varchar LFN -- varchar

Example: Movie Trailers Enabling Grids for E-sciencE Movie trailers files (entries) saved on Grid Storage Elements and registered into File Catalogue We want to add metadata to describe movie content. A possible schema: Title -- varchar Runtime -- int Cast -- varchar LFN -- varchar A metadata catalogue will be the repository of the movies metadata and will allow to find movies satisfying users queries

Trailer s example Entry names Title Ru nti 8c3315c1-811f-4823-a778-60a203439689 51a18b7a-fd21-4b2c-aa74-4c53ee64846a 401e6df4-c1be-4822-958c-ce3eb5c54fcb My Best Friend s wedding Cast 80 Julia Roberts Spider-man 2 120 Kirsten Dunst LFN lfn:/grid/gilda/movies/ mybfwed.avi lfn:/grid/gilda/movies/ spiderman2.avi The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi 6

Trailer s example Attribute Entry names Title Ru nti 8c3315c1-811f-4823-a778-60a203439689 51a18b7a-fd21-4b2c-aa74-4c53ee64846a 401e6df4-c1be-4822-958c-ce3eb5c54fcb My Best Friend s wedding Cast 80 Julia Roberts Spider-man 2 120 Kirsten Dunst LFN lfn:/grid/gilda/movies/ mybfwed.avi lfn:/grid/gilda/movies/ spiderman2.avi The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi 6

Trailer s example Schema Attribute Entry names Title Ru nti 8c3315c1-811f-4823-a778-60a203439689 51a18b7a-fd21-4b2c-aa74-4c53ee64846a 401e6df4-c1be-4822-958c-ce3eb5c54fcb My Best Friend s wedding Cast 80 Julia Roberts Spider-man 2 120 Kirsten Dunst LFN lfn:/grid/gilda/movies/ mybfwed.avi lfn:/grid/gilda/movies/ spiderman2.avi The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi 6

Trailer s example Schema Attribute Entry names Title Ru nti 8c3315c1-811f-4823-a778-60a203439689 51a18b7a-fd21-4b2c-aa74-4c53ee64846a 401e6df4-c1be-4822-958c-ce3eb5c54fcb My Best Friend s wedding Cast 80 Julia Roberts Spider-man 2 120 Kirsten Dunst LFN lfn:/grid/gilda/movies/ mybfwed.avi lfn:/grid/gilda/movies/ spiderman2.avi The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi Entries 6

Trailer s example Schema Attribute Entry names Title Ru nti 8c3315c1-811f-4823-a778-60a203439689 51a18b7a-fd21-4b2c-aa74-4c53ee64846a 401e6df4-c1be-4822-958c-ce3eb5c54fcb My Best Friend s wedding Cast 80 Julia Roberts Spider-man 2 120 Kirsten Dunst LFN lfn:/grid/gilda/movies/ mybfwed.avi lfn:/grid/gilda/movies/ spiderman2.avi The God Father 113 Al pacino lfn:/grid/gilda/movies/ godfather.avi Collection /trailers Entries 6

Example Enabling Grids for E-sciencE

Metadata on the Grid 8

Metadata on the Grid Information about files -- but not only! 8

Metadata on the Grid Information about files -- but not only! metadata can describe any grid entity/object ex: JobIDs - add logging information to your jobs 8

Metadata on the Grid Information about files -- but not only! metadata can describe any grid entity/object ex: JobIDs - add logging information to your jobs monitoring of running applications: ex: ongoing results from running jobs can be published on the metadata server 8

Metadata on the Grid Information about files -- but not only! metadata can describe any grid entity/object ex: JobIDs - add logging information to your jobs monitoring of running applications: ex: ongoing results from running jobs can be published on the metadata server information exchanging among grid peers ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to consume 8

Metadata on the Grid Information about files -- but not only! metadata can describe any grid entity/object ex: JobIDs - add logging information to your jobs monitoring of running applications: ex: ongoing results from running jobs can be published on the metadata server information exchanging among grid peers ex: producers/consumers job collections: master jobs produce data to be analyzed; slave jobs query the metadata server to retrieve input to consume Simplified DB access on the grid Grid applications that needs structured data can model their data schemas as metadata 8

Monitoring of running application Enabling Grids for E-sciencE SE W N showing results as long as they are produced CE Workload Manager WN WN Metadata Catalogue /results collection Scientist/Developer submitting jobs Customer/ Scientist 9

Use a Metadata services to exchange data among running jobs

Use a Metadata services to exchange data among running jobs Suppose we have two sets of jobs: Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN Consumers: they will take a LFN, download the file and elaborate it

Use a Metadata services to exchange data among running jobs Suppose we have two sets of jobs: Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN Consumers: they will take a LFN, download the file and elaborate it A Metadata collection can be used to share the information generated by the Producers; it could act as a bag-of-lfns (bag-of-task model) from which Consumers can fetch file for further elaboration

Information exchanging among grid peers Enabling Grids for E-sciencE Producers jobs put LFN W N SE Consumers jobs W N fetch LFN CE WN WN Metadata Catalogue /bag-of-lfns collection WN Workload Manager CE WN Scientist/Developer submitting jobs 11

The AMGA Metadata Catalogue Enabling Grids for E-sciencE 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application Provide a complete but simple interface, in order to make all users able to use it easily. 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application Provide a complete but simple interface, in order to make all users able to use it easily. Designed with scalability in mind in order to deal with large number of entries 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application Provide a complete but simple interface, in order to make all users able to use it easily. Designed with scalability in mind in order to deal with large number of entries Grid security is provided to grant different access levels to different users. 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application Provide a complete but simple interface, in order to make all users able to use it easily. Designed with scalability in mind in order to deal with large number of entries Grid security is provided to grant different access levels to different users. Flexible with support to dynamic schemas in order to serve several application domains 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application Provide a complete but simple interface, in order to make all users able to use it easily. Designed with scalability in mind in order to deal with large number of entries Grid security is provided to grant different access levels to different users. Flexible with support to dynamic schemas in order to serve several application domains Allow hierarchical metadata schemas 12

The AMGA Metadata Catalogue Enabling Grids for E-sciencE Official metadata service for the glite middleware AMGA: Arda Metadata Grid Application Provide a complete but simple interface, in order to make all users able to use it easily. Designed with scalability in mind in order to deal with large number of entries Grid security is provided to grant different access levels to different users. Flexible with support to dynamic schemas in order to serve several application domains Allow hierarchical metadata schemas 12

AMGA Analogies 13

AMGA Analogies Analogy to the RDBMS world: schema table schema collection db table attribute schema column entry table row/record 13

AMGA Analogies Enabling Grids for E-sciencE Analogy to the RDBMS world: schema table schema collection db table attribute schema column entry table row/record Analogy to file system: Collection Directory Entry File 13

AMGA Analogies Enabling Grids for E-sciencE Analogy to the RDBMS world: schema table schema collection db table attribute schema column entry table row/record Analogy to file system: Collection Directory Entry File Example: createdir /jobs (create table jobs) addattr /jobs jobstatus int (alter table jobs add column jobstatus int) addentry /jobs/job1 jobstatus 0 (insert into jobs (jobstatus) values(1)) updateattr /jobs jobstatus 1 jobid>100 (update jobs set jobstatus=1 where JobID>100) 13

AMGA Features

AMGA Features Dynamic Schemas Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes

AMGA Features Dynamic Schemas Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes AMGA collections are hierarchical organized Collections can contain sub-collections Sub-collections can inherit/extend parent collection schema

AMGA Features Dynamic Schemas Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes AMGA collections are hierarchical organized Collections can contain sub-collections Sub-collections can inherit/extend parent collection schema Flexible Queries SQL-like query language Different join type (inner, outer, left, right) between schemas are provided selectattr /glibrary:filename /glaudio:author /glaudio:album '/glibrary:file=/glaudio:file and like(/glibrary:filename, %.mp3")

AMGA Features Dynamic Schemas Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes AMGA collections are hierarchical organized Collections can contain sub-collections Sub-collections can inherit/extend parent collection schema Flexible Queries SQL-like query language Different join type (inner, outer, left, right) between schemas are provided selectattr /glibrary:filename /glaudio:author /glaudio:album '/glibrary:file=/glaudio:file and like(/glibrary:filename, %.mp3") Support for Views, Constraints, Indexes

AMGA Security 15

AMGA Security Unix style permissions - users and groups 15

AMGA Security Unix style permissions - users and groups ACLs Per-collection or per-entry. 15

AMGA Security Unix style permissions - users and groups ACLs Per-collection or per-entry. Secure client/server connections SSL 15

AMGA Security Unix style permissions - users and groups ACLs Per-collection or per-entry. Secure client/server connections SSL Client Authentication based on Username/password General X509 certificates (DN based) Grid-proxy certificates (DN based) 15

AMGA Security Unix style permissions - users and groups ACLs Per-collection or per-entry. Secure client/server connections SSL Client Authentication based on Username/password General X509 certificates (DN based) Grid-proxy certificates (DN based) VOMS support: VO attribute maps to defined AMGA user VOMS Role maps to defined AMGA user VOMS Group maps to defined AMGA group 15

AMGA Implementation Enabling Grids for E-sciencE C++ multiprocess server Backends Oracle, MySQL 4/5, PostgreSQL, SQLite Front Ends TCP text streaming High performance Client API for C++, Java, Python, Perl, PHP SOAP (web services) Interoperability Scalability AMGA server runs on SLC3/4, Fedora Core, Gentoo, Debian Standalone Python Library implementation Data stored on file system

AMGA Datatypes

AMGA Datatypes Using the above datatypes you are sure that your metadata can be easily moved to all supported backends

AMGA Datatypes Using the above datatypes you are sure that your metadata can be easily moved to all supported backends If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back-end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones)

Accessing AMGA from UI/WNs

Accessing AMGA from UI/WNs Enabling Grids for E-sciencE TCP Streaming Front-end mdcli & mdclient CLI and C++ API (md_cli.h, MD_Client.h) Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows!!) Python and Perl Client API PHP Client API NEW developed totally by the GILDA team INFN CT AMGA Web Interface (AMGA WI) ---NEW Developed totally by the GILDA team INFN CT Based on JAVA AMGA Standard APIs Web Application using standard as JSP Custom Tags, Servlet

Accessing AMGA from UI/WNs Enabling Grids for E-sciencE TCP Streaming Front-end mdcli & mdclient CLI and C++ API (md_cli.h, MD_Client.h) Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows!!) Python and Perl Client API PHP Client API NEW developed totally by the GILDA team INFN CT AMGA Web Interface (AMGA WI) ---NEW Developed totally by the GILDA team INFN CT Based on JAVA AMGA Standard APIs Web Application using standard as JSP Custom Tags, Servlet SOAP Frontend (WSDL) C++ gsoap AXIS (Java) ZSI (Python)

AMGA Web Interface

Collection Management Modify Schema Instance Delete entry

AMGA WI Tool Bars Address bar Go! back to parent type collection name add collection new collection bulk upload Modify Schema new entry search entry ACL management

Advanced features: Metadata Replication Enabling Grids for E-sciencE

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops)

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops) Architecture

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops) Architecture Asynchronous replication

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops) Architecture Asynchronous replication Master-slave writes only allowed on the master

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops) Architecture Asynchronous replication Master-slave writes only allowed on the master Application level replication

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops) Architecture Asynchronous replication Master-slave writes only allowed on the master Application level replication Replicate Metadata commands

Advanced features: Metadata Replication Enabling Grids for E-sciencE AMGA provides a replication/federation mechanisms Motivation Scalability Support hundreds/thousands of concurrent users Geographical distribution Hide network latency Reliability No single point of failure DB Independent replication Heterogeneous DB systems Disconnected computing Off-line access (laptops) Architecture Asynchronous replication Master-slave writes only allowed on the master Application level replication Replicate Metadata commands Partial replication supports replication of only sub-trees of the metadata hierarchy

Metadata Replication: Use cases Enabling Grids for E-sciencE Full replication Partial replication Federation Proxy

Existing DB access with AMGA Enabling Grids for E-sciencE 24

Existing DB access with AMGA Enabling Grids for E-sciencE Since AMGA 1.2.10, a new import feature allow to access existing DB table 24

Existing DB access with AMGA Enabling Grids for E-sciencE Since AMGA 1.2.10, a new import feature allow to access existing DB table Once imported into AMGA the tables from or more DBs you want to access through AMGA, you can exploit many of the features brought to you by AMGA for your existing tables 24

Existing DB access with AMGA Enabling Grids for E-sciencE Since AMGA 1.2.10, a new import feature allow to access existing DB table Once imported into AMGA the tables from or more DBs you want to access through AMGA, you can exploit many of the features brought to you by AMGA for your existing tables Advantages: your db tables can be accessed by grid users/applications, using grid authentication (VOMS proxies)/authorization with ACLs 24

Set up AMGA to access your tables Enabling Grids for E-sciencE To remember: AMGA stores its own tables in its DB backend To access and existing DB you have 2 option: import the tables of the DB you want to access to into AMGA DB backend viceversa, add AMGA DB backed tables to the DB you want to access to Use the import command by root to mount you table into the AMGA collection hierarchy Query> whoami >> root Query> createdir /world Query> cd /world/ Query> import world.city /world/city Query> import world.country /world/country Query> import world.countrylanguage /world/countrylanguage 25

Set up AMGA to access your tables Enabling Grids for E-sciencE Properly set up authorization on the imported tables: Query> acl_remove /world/city/ system:anyuser Query> acl_remove /world/country system:anyuser Query> acl_add /world/ gilda:users rx Query> acl_show /world >> root rwx >> gilda:users rx >> system:anyuser rx Query> selectattr City:CountryCode City:Name 'like(city:name, "Am%") limit 5' >> NLD >> Amsterdam >> NLD >> Amersfoort >> BRA >> Americana >> ECU >> Ambato >> IDN More information on existing DB access @: http://amga.web.cern.ch/amga/importing.html https://grid.ct.infn.it/twiki/bin/view/gilda/amgadbaccess 26

Early adopters of AMGA

Early adopters of AMGA LHCb-bookkeeping Migrated bookkeeping metadata to ARDA prototype 20M entries, 15 GB Large amount of static metadata Feedback valuable in improving interface and fixing bugs AMGA showing good scalability

Early adopters of AMGA LHCb-bookkeeping Migrated bookkeeping metadata to ARDA prototype 20M entries, 15 GB Large amount of static metadata Feedback valuable in improving interface and fixing bugs AMGA showing good scalability Ganga Job management system Developed jointly by Atlas and LHCb Uses AMGA for storing information about job status Small amount of highly dynamic metadata

Biomed - MDM Medical Data Manager MDM Store and access medical images and associated metadata on the Grid Built on top of glite 1.5 data management system Demonstrated at last EGEE conference (October 05, Pisa) Strong security requirements Patient data is sensitive Data must be encrypted Metadata access must be restricted to authorized users AMGA used as metadata server Demonstrates authentication and encrypted access Used as a simplified DB More details at https://uimon.cern.ch/twiki/bin/view/egee/dmencryptedstorage

gmod: grid Movie On Demand

gmod: grid Movie On Demand gmod provides a Video-On-Demand service

gmod: grid Movie On Demand gmod provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user s workstation

gmod: grid Movie On Demand gmod provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user s workstation For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes

gmod: grid Movie On Demand gmod provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user s workstation For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes Two kind of users can interact with gmod: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed.

gmod under the hood Enabling Grids for E-sciencE

gmod under the hood Enabling Grids for E-sciencE Built on top of glite services:

gmod under the hood Enabling Grids for E-sciencE Built on top of glite services: Storage Elements, sited in different place, physically contain the movie files

gmod under the hood Enabling Grids for E-sciencE Built on top of glite services: Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located

gmod under the hood Enabling Grids for E-sciencE Built on top of glite services: Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them

gmod under the hood Enabling Grids for E-sciencE Built on top of glite services: Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users

gmod under the hood Enabling Grids for E-sciencE Built on top of glite services: Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user s desktop or laptop

gmod interactions VOMS get Role GENIUS Portal Metadata Catalogue AMGA Storage Elements LFC File Catalogue User Workload Management System W N WN WN CE

gmod screenshot gmod is accesible through the Genius Portal (https://glite-demo.ct.infn.it)

GSAF Framework (Grid Storage Access Framework) GSAF is an Object Oriented Framework built on top of the Grid Metadata Service and Grid Data Service and exposes classes and related methods for applications Main objective hide the complexity and the fragmentation of the several underlying APIs satisfy functional requirements shared among applications ensure atomicity among different data manipulation operations

GSAF System Architecture Enabling Grids for E-sciencE The core of the framework is designed as a set of plug-ins Its design covers several Object Oriented Design Patterns (Singleton, Strategy method, Factory method, Template Method, Iterator and Composite). This ensures a very clean and simple software architecture with an high degree of cohesion and decoupling. Built on top of Data Management Services of the Grid Middleware

Class Diagram Strategy Method Pattern Iterator Pattern Factory Method Pattern Composite Pattern Command Pattern Object Model Logical Entities /Java Object mapping

36

37

38

ADAT project

What is glibrary Enabling Grids for E-sciencE glibrary challenge is to offer a multiplatform, flexible, secure and intuitive system to handle digital assets on a Grid Infrastructure. By Digital Asset, we mean any kind of content and/or media represented as a computer file. Examples: Images Videos Presentations Office documents E-mails, web pages Newsletters, brochures, bulletins, sheets, templates Receipts, e-books... (only the imagination can make a limit) It allows to store, organize, search and retrieve those assets on a Grid environment. 40

Store assets on the Grid User s local assets are uploaded to one or more (as replicas) Storage Systems the user is authorized on Uploads are managed through Java Applets: a direct GSIFTP copy is done from the local file to the chosen Storage Element File already on the Grid can be managed by glibrary too a File Catalogue browser is integrated to select existing grid files. 41

All entries are organized according to their type: a list of specific attributes to describe each kind of assets to be managed by the system; hierarchical (child type shares parent s attributes) defined by the glibrary administrators queried by users EXAMPLE OF TYPES AND ATTRIBUTES LIST Type Audio Music Presentation Training (Root) Attributes list Format, Bitrate, Samplerate, Time (Format, Bitrate, Samplerate, Time), Name, Artist, Album, Genre, Tracknumber, Year, Artwork, Lyric, Rating Format, NumOfPages (Format, NumOfPages), Title, Runtime, Speaker, Author, Subject, Event, Date, Type FileName, SubmissionDate, Description, Keywords, LastModificationDate, Size Organize assets Assets can be organized also by category: Group together related assets of different types; Useful also to define subset of assets belonging to the same type Multiple category assignment per asset 42

Search assets Enabling Grids for E-sciencE Assets are browsed selecting a type (or category) and selecting one or more filters: type attributes chosen from a defined list, used to narrow the result set Filter application is cascading and context-sensitive: the selection of a filter value dynamically influences subsequent filter values ( à la itunes browser) Classic search available too 43

Retrieve assets from the Grid User is presented with a list of asset replicas Download from the chosen storage element is matter of a mouse click Transfer handled over GridFTP with a Java Applet 44

Features Enabling Grids for E-sciencE Implemented as Web 2.0 application AJAX and Javascript are strongly used to offer a desktop like user experience Business logic implemented using PHP 5 OOP support 45

Browsing screenshot Enabling Grids for E-sciencE 46

Entry detail screenshot Enabling Grids for E-sciencE 47

Upload screenshot 48

Architecture overview VOMS Server Login applet 3. get role LFC File Catalogue 4. find the right asset AMGA Metadata Catalogue SE 2. proxy transfer over HTTPS SE 1. local proxy creation 6. direct transfer from SE 5. proxy retrieved over HTTPS SE User Upload/Download applet 49

Conclusion AMGA Metadata Service of glite Part of glite 3.1 Useful to realize simple Relational Schemas Integrated on the Grid Environment (Security) Replication/Federation features Importing existing databases Tests show good performance/scalability Already deployed by several Grid Applications LHCb, ATLAS, Biomed, gmod, glibrary, ADAT

References AMGA Web Site http://cern.ch/amga AMGA Manual http://amga.web.cern.ch/amga/downloads/amgamanual_1_3_0.pdf AMGA API Javadoc http://amga.web.cern.ch/amga/javadoc/index.html AMGA Web Frontend http://gilda-forge.ct.infn.it/projects/amgawi/ AMGA Basic Tutorial https://grid.ct.infn.it/twiki/bin/view/gilda/amgahandson 51

Preguntas