Software technologies for integration of process and data in medical imaging The Platform Federating multi-centric neuroscience resources Johan MONTAGNAT Franck MICHEL Vilnius, Apr. 13 th 2011 ANR-06-TLOG-024 http://neurolog.polytech.unice.fr
Neurosciences requirements Major challenge for this century population aging, brain disorders growth, brain function understanding... Large medical image databases Statistical studies Population-specific atlases of the brain Data intensive procedures Heterogeneous data sets Different acquisition conditions, centers Several imaging modalities Associated clinical information Complex data analysis procedures Specific to some modalities, acquisition parameters Minutes to hours of computation time each Chained into application pipelines (workflows) Sensitive data Stringent access control requirements ANR-06-TLOG-024 2
Collaborative approach Sharing Computing algorithms and resources Research (populations studies, models design, validation, statistics) Complex analysis algorithms & pipelines (compute intensive image processing, time constraints...) Data Procedures Processing tools Computing power ANR-06-TLOG-024 3
Brain atrophy measure workflow Detection of the longitudinal brain volume change is an issue of central relevance in neuroimaging. Early diagnosis for neurodegenerative diseases (e.g. Alzheimer's). Reduction of costs in clinical trials, increasing of the power in longitudinal studies. ANR-06-TLOG-024 4
Inputs: longitudinal study Baseline image (T0) Other time point mages (T0 + 6 months, T0 + 12 months...) ANR-06-TLOG-024 5
Image normalization Space alignment (registration) Intensity alignment ANR-06-TLOG-024 6
Parameters extraction Mask Brain extraction 1044901-10294 -0.009 1044901-10484 -0.010 Quantitative parameters (atrophy measurement For Alzheimer's disease diagnosis) Deformation field computation ANR-06-TLOG-024 7
Generic infrastructure limitations The grid provides a foundational layer for distributed, intensive computing Distributed files, large number of computing tasks Gap between grid infrastructures and medical environment Low level foundational middlewares Complex requirements from the health community Need for neuroradiological data integration Domain-specific data representation, mediation for existing databases Legacy neuroscience computing environments Bridging local and grid resources Neurology data analysis pipelines Need to integrate neuro-data analysis codes and procedures Access control and privacy The foundational security layer needs to be refined with adapted security policies ANR-06-TLOG-024 8
objectives Enable the sharing of resources: Data & knowledge representation Ontologies + relational schema Neuroradiological data & associated metadata Distributed on neuroscience centers + EGI grid resources Integration of heterogeneous data stores Image analysis tools Bundled, relocatable, remote invocation Application pipelines Sites computing resources Four pathologies Multiple Sclerosis, brain strokes, brain tumors, Alzheimer's disease ANR-06-TLOG-024 9
Middleware design ANR-06-TLOG-024 10
Software architecture ANR-06-TLOG-024 11
Platform deployment 5 sites connected 4 collaborating hosiptals Pitié Salpétrière (Paris) Michalon (Grenoble) CHU Rennes Antoine Lacassagne (Nice) 7 academic partners I3S, IRISA, GIN, MIS, IFR49, INRIA Sophia, LRI 2 companies SAP, Visioscopie ANR-06-TLOG-024 12
Data management layer Provide a seamless access to heterogeneous distributed data Heterogeneous data (modality, clinical context ) Heterogeneous legacy database providers & schemas Heterogeneous file systems, resource storage units (local, grid) Need to provide: Federated view of the metadata Common access to physical files While enforcing strong constraints: Each partner site should keep control of access to their data Keep autonomous data management on each site: weak coupling Legacy data stores should not be altered Ensure secure access to sensitive data and metadata ANR-06-TLOG-024 13
Ontology Common relational schema Federated relational schema Variables Study Instrument Scores Examination Assessment Subject MR Protocols Dataset ANR-06-TLOG-024 14
Data management layer Approach Derived the ontology into relational Federated Schema A dynamic mediation & federation interface maps local database schemas to the federated schema A file transfer interface makes files available to the end-user or processing tools Come up with a global federated view that hides data distribution and heterogeneity from the end-user ANR-06-TLOG-024 15
Sharing image analysis tools Generic Application Service Wrapper (GASW) Service wrapper to non instrumented code Tool packaging in re-locatable self-contained executable units Expose tools as web services, standard invocation interface Handle data transfer Remote execution capability on the EGI grid Tools discovery through the federated view Executable remotely by any authorized user ANR-06-TLOG-024 16
Enabling processing pipelines MOTEUR workflow engine Generic workflow design and execution Support for different interfaces to processors Data and processing parallelism Handles stand-alone (client) and client-server deployment ANR-06-TLOG-024 17
Distributed data access control Multiple credentials per user Grid certificates (delivered by grid authority) Middleware certificates (delivered by site authority) Databases credential (SQL 92) Health professional smartcards Single sign-on enforced security policy Individuals identification Distributed security administration No central point of control Sites keep access control over all their data Adapts to heterogeneous site security policies ANR-06-TLOG-024 18
Distributed data access control Each site data access policy prevails for the data items the site owns rule : { StudyA ; read ; } ANR-06-TLOG-024 19
Results collaborating platform for multi-centric studies Integrate heterogeneous & distributed legacy data sets Share image analysis tools, distribute invocation Build complex experiment pipelines Distributed access control with prevailing local policies Advanced functionality High level ontology-based data representation EGI Grid interface, large-scale distributed processing The grid for neuroscientists Transparent access to grid resources Compliance with legacy environments ANR-06-TLOG-024 20
Limitations Semantic validation not yet integrated At processing tool annotation time: check compatibility of inputs/outputs with DataSet Processing class constraints At w/f design time: user-assisted composition checks compatibility of inputs/outputs of composed services At run time: check validity of actual inputs for each service Produced semantic data is still limited Developments on-going to provide richer semantic description of produced datasets using reasoning EGI interface still limited Integration work on-going to complete remote invocation and retrieval of results from storage elements ANR-06-TLOG-024 21
Global picture PS2 S3 Q S 4 Workflow Manager ANR-06-TLOG-024 22
Thank you ANR-06-TLOG-024 23