Software technologies for integration of process and data in medical imaging NeuroLOG WP1 Sharing Data & Metadata Franck MICHEL Paris, May 18 th 2010 NeuroLOG ANR-06-TLOG-024 http://neurolog.polytech.unice.fr
Definitions Software technologies for integration of process, data and knowledge in medical imaging Data: image files Metadata: data about data, that is any data related to image files Type of image, modality, processing Examination, acquisition equipment Subject: age, gender, pathology NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 2
Partner sites own specific databases Specific database providers, OS Specific databases should be comparable Same concerns manage the same major entities Heterogeneous: differences in the database design (schema) WP1 goals Define a way to share a common view: a cornerstone Definition of a federated relational schema Data Federator to map specific schemas to the federated schema Comes with the need to bring up global coherency Define a way to share image files described by metadata Files distributed over distant sites Heterogeneous file systems, resource storage units Need for each sites to keep control on their data = weak coupling NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 3
WP1: Data Module (III) for performing transversal search of information through a set of local repositories by using specific adapters to local information I The image part with relationship ID rid4 was not found in the file. User Authentication Client Application II III Semantic Repository METAmorphoses (SQL RDF) Semantic Queries Engine (CORESE) Query Interface V Visualizer Computing Interface MOTEUR Workflow Engine Other optimization and context-aware service Data sets and workflows Grid Application Service Wrapper Grid Interface Data Federator (DF) SQL - Authorization DF DF Site-Specific DB (metadata + image files) NeuroLOG Data Base SsDB NL DB SsDB NL DB Grid Storage (images) IV NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 4
Design process Definition of the ontology Definition of the federated relational schema derived from the ontology all sites will align their own database on this schema Definition of site-specific Data Federator mappings Definition of NeuroLOG services exposed to the client application by the NeuroLOG server NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 5
Design cycle Adapt ontology Ontology Map sitespecific database NeuroLOG schema Adapt site-specific database NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 6
How revelant is the federated view of a specific table? Come up with relations that do not exist in the specific database Consistent with the semantics of the site-specific database? A mapping may loose information, narrow a concept e.g.: Left/Right vs. Left/Right/Converted Left/Ambidextrous Acceptable loss? A mapping may broaden a concept Make sure not to come up with unconsistent data Ensure consistency with data from other sites NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 7
Distribution comes with issues Multiple databases coherency may be challenging Internally to a site: the site-specific DB being managed independently from the middleware Externally: each site being autonomous Need to achieve compromise between distributed coherency and sites autonomy Need to handle Cross-references between entities Metadata Schema Distribution Entities replications: some entities are unique (e.g. Image) while others may be found on several sites (e.g. Subject, Study) NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, Jan. 5 th to 7th, 2010 8
Get access to files distributed over distant sites Browse, retrieve and store local or remote data files, either on NeuroLOG sites or on the EGEE Grid infrastructure Provide file transfer services between NeuroLOG servers, or between NeuroLOG servers and clients Deal with heterogeneous file systems, access protocols Standard protocols: local file, ftp/sftp, http/https Grid protocols: GridFTP, LFC Provide some sort of virtual file system Enforce a security policy Leverage the Security Layer Check user authorizations to access files Secure transfer Rely on the Grid security infrastructure Grid Certificates, Grid proxy NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 9
Need to provide an interface for managing files In the manner of a virtual file system But do not build yet another full virtual file system NeuroLOG Client Data Manager Data Manager Local storage resources Data Manager GRID Controller NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 10
Files exposed to client for direct transfer Grid certificate Get file through GridFTP Grid Storage Element NeuroLOG Certificate Require accessible file copy Get file by url Data Manager Data Manager Delegate request to owning site server (incl. credentials) - Check authorizations. - Make temporary copy of file on accessible file server Site 1 Site 2 Local storage resources (file, http, ftp ) NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 11
Example: processing remote files Grid certificate Grid Storage Element NeuroLOG Certificate Return result files to client Require processing of a file on the grid Require processing Processing Tools Data Manager Data Manager Delegate request to owning site server (incl. credentials) - Check authorizations - Send copy of file grid storage resource Site 1 Site 2 NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 12
5 sites deployed ASCLEPIOS, GIN, I3S, IFR49, IRISA NeuroLOG services Metadata federated view IRISA I3S GIN IFR49 ASCLEPIOS NeuroLOG server NeuroLOG server NeuroLOG server NeuroLOG server NeuroLOG server Data Federator Data Federator Data Federator Data Federator Data Federator Results InriaNeuroTK Results Results GIN-DMS Results CAC Results Shanoir NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 13
Thank you Any engineer position? Available January 1rst, 2011. NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 14
Backup slides Data & Metadata GUI NeuroLOG ANR-06-TLOG-024 15
NeuroLOG Server exposes a set of services Web Services interface Query federated metadata Search by criteria (subjects, studies, datasets ) Browse through federated metadata E.g. get datasets for subjects older than 40, produced in studies started after 2007 Download dataset files based on global sharing policy Save downloaded datasets to customer directory Query processing tools using datasets selected from metadata NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, May 18 th, 2010 16
NeuroLOG client GUI - Querying metadata What for: gather Datasets in a cart Use datasets of cart as inputs to: Visualization tools (Viscioscopie) Processing workflows Download for further local processing Several ways of querying metadata in the client GUI Fill parameters of multi-criteria predefined queries To be defined on a user needs-basis Browse through metadata Browsing follows branches of a browsing tree Browsing tree likely to evolve along with users feed-back Designed in an easy-to-maintain way Explore metadata from a given root NeuroLOG ANR-06-TLOG-024 NeuroLOG demonstration, Paris, September 7, 2009 17
Current browsing tree Software technologies for integration of process, data and knowledge in medical imaging root Investigator Study Dataset Centre Subject Dataset Study Dataset (input of study) Dataset (result of study) Subject Experimental group of subjects Dataset Study Dataset (input of study) Dataset Dataset (input of study) (result of study) Dataset Dataset (input of study) Dataset (result of study) Entity : tree leaf NeuroLOG ANR-06-TLOG-024 NeuroLOG demonstration, Paris, September 7, 2009 18
Start NeuroLOG ANR-06-TLOG-024 19
Search studies NeuroLOG ANR-06-TLOG-024 NeuroLOG WP1 - Paris, Jan. 5 th to 7th, 2010 20
Search studies NeuroLOG ANR-06-TLOG-024 21
Search studies NeuroLOG ANR-06-TLOG-024 22
View details NeuroLOG ANR-06-TLOG-024 23
Explore metadata NeuroLOG ANR-06-TLOG-024 24
Search subjects involved in selected studies NeuroLOG ANR-06-TLOG-024 25
Search subjects involved in selected study NeuroLOG ANR-06-TLOG-024 26
Search datasets related to selected subjects, produced in the selected study NeuroLOG ANR-06-TLOG-024 27
Search datasets related to selected subjects, produced in the selected study NeuroLOG ANR-06-TLOG-024 28
Download datasets from the cart Software technologies for integration of process, data and knowledge in medical imaging NeuroLOG ANR-06-TLOG-024 29
View download datasets Software technologies for integration of process, data and knowledge in medical imaging NeuroLOG ANR-06-TLOG-024 30
Authorization Software technologies for integration of process, data and knowledge in medical imaging In case the user has no right to read the requested dataset: Then the user should subscribe to the appropriate role NeuroLOG ANR-06-TLOG-024 31
The administrator of the site that manages the role gets the request: Administrator grants user with the requested role User can restart download NeuroLOG ANR-06-TLOG-024 32
Apache Tomcat Metro JAX-WS NeuroLOG ANR-06-TLOG-024 33