Session 2 A virtual Observatory for TerraSAR-X data

Session 2 A virtual Observatory for TerraSAR-X data 3rd User Community Workshop Chania, 12-14 June 2013 Presenter: Mihai Datcu and Daniela Espinoza Molina (DLR)

Motivation ENVISAT provided measurements of the atmosphere, ocean, land, and ice over 10 years of operations generating a data archive that reaches many petabytes TerraSAR-X contains an extensive data archive of more than 100,000 scenes covering the majority of the Earth's surface. the Sentinels (1-5) will be launched contributing with data for land monitoring, storing several petabytes. The data archive The data access

TELEIOS Architecture

Concepts and Components Query, Data Mining & Knowledge Discovery Data Sources Data Model Generation DBMS Visual Data Mining Users Interpretation & Understanding

Data Model Generation Data Sources Content Analysis Context Analysis EO images Image Content Query, Data Mining & Knowledge Discovery Metadata Metadata Content Context Analysis DBMS Visual Data Mining Users GIS GIS Content Metadata extraction Patch generation Feature extraction methods: Gabor filters, Weber Local descriptor, Bag of words, dictionary-based compression features, etc. Interpretation & Understanding

Data Model Generation: Ingested images Location of the 100 TerraSAR-X scenes and the distribution of the scenes over the World Data Model composed of: ~110000 patches ~320 semantic categories C.O. Dumitru and M. Datcu, Information Content of Very High Resolution SAR Images: Study of Feature Extraction and Imaging Parameters, IEEE Trans. Geoscience and Remote Sensing. To be published

Database Management System Query, Data Mining & Knowledge Discovery Strabon DBMS Data Sources Data Model Generation SQL_RETRIEVE_QUERY_BY_DICTIO NARY=""" WITH user_dict_size AS ( SELECT uls.id AS id, uls.label as label, COUNT(*) AS cnt Visual Data Mining Users FROM user_labels AS uls, user_dictionaries AS uds WHERE uls.label LIKE '%(user_label_label)s' AND uls.id=uds.user_label_id GROUP BY uls.id, uls.label ), Database scheme relation DB Some data mining functions Similarity metrics SQL functions for querying Distances Interpretation & Understanding

Database Management System Example of data model for Earth-Observation images SQL_RETRIEVE_QUERY_BY_DICTIONARY="" WITH user_dict_size AS ( SELECT uls.id AS id, uls.label as label, COUNT(*) AS cn FROM user_labels AS uls, user_dictionaries AS uds WHERE uls.label LIKE '%(user_label_label)s' AND uls.id=uds.user_label_id GROUP BY uls.id, uls.label ), inter_size AS ( -- size of dictionary intersection for all pairs of patches SELECT ud1.user_label_id AS user_label_1, ul1.label AS label_1, d2.patch_id AS patch_2, p2.label AS label_2, count(*) AS cnt_1_2 FROM. Fast Compression distance computation

Query, Data Mining and Knowledge Discovery Query, Data Mining & Knowledge Discovery GLCM Gabor QMF NL-STFT SVM GUI Relevance Feedback Queries: Query Builder Ontologies: Strabon SQL Query by Example Data Mining Semantic Definition Primitive Feature Extraction Blocks Semantic Class DB Data Sources Data Model Generation DBMS Visual Data Mining Users Interpretation & Understanding

Functional Concepts Browsing: Navigation in the archive for discovering things. The direction, and goals of the navigation are changing based on the content of the queries. It gives a global image of the archive content. Searching: Navigation in the archive with a clear objective, the results of each query is used for directing the navigation. Annotation: Creation of new catalogue (index) entries for the archive. The result of browsing or searching may be used to annotate the results.

Exploration/Query Concepts Numeric and predefined queries: classical geographic position, time, senor type, etc. The output is a list of products in the specified parameters Semantic an numeric queries: by use of supporting frames a complex question can be formalized. The output is a list of products explained by the query sentence. Image similarity queries: search of similar images with a given example. The output is a ranked list of images. KDD: interactive search supported by relevance feedback mechanisms. The output is the desired product, the related products (containing similar information), semantically explained, or new, previously unknown information. Visual Data Mining: Visual exploration of the whole archive. The output are outliers, interesting groups, associations, etc.

Agenda (2/2)

Semantic and numerical queries Semantic Labels: The user can enter a simple label in the form of text or select an item from the available labels in the catalogue to perform the query (as for example forest ). We observe that these labels are predefined labels previously obtained as results of the image annotation process. Query language and Ontologies: Here, the queries can be performed either using only semantics or semantics and spatial content in the form of text or numerical entries. The queries based on spatial content are performed by using the image descriptors. The query language can rely on a query template in order to avoid mistyping and helping the user. Query Builder enabling queries by using semantics, topological relations, different operators, and numerical descriptors

Query by Metadata and Ontologies TerraSAR-X metadata used for RDFs and Queries: XML file contains information about productcomponent, annotation, imagedata, missioninfo, acquisitioninfo, sceneinfo, etc Ontology queries Extraction of Metadata from XML annotation file

Agenda (2/2)

Query by Semantics Bridge Harbor River deposits Agriculture Breaking waves Forest and Water Vegetation Earth-Observation data model Urban + Water Urban type 1 Vegetation Buoy Water t1 Water and Boats Forest t1 Grassland Forest t2 Bridge t2 Water and Urban Roads Structure roof Railway Urban t2 Grassland Grassland Building shape Urban t3 Building reflection Vegetation Roads and Urban Trees and Buildings Water Channel Airport Forest Urban 4 Building Road + forest Urban t5 Roads highway Urban 6 Water t2 Channel queries SELECT label_id, name, FROM annotation a Join label l on a.label_id=l.label_id Skyscrapers Urban t7 Streets with buildings Sport fields Railway track Skyscrapers Semantic annotation of TerraSAR-X image content

Query by Example Classical approach of CBIR (Smeulders, 2000) Proposed approach of Content-based image retrieval Similarity Measure Many steps and parameters to set Compare each object in the set to a query image Q and rank the results on the basis of their distance from Q. Fast Compression Distance FCD( x, y) = D( x) ( D( x), D( y) ) D( x) Cerra and Datcu, 2010, A fast compression-based similarity measure with applications to content-based image retrieval, Journal of Visual Communication and Image Representation, vol. 23, no. 2, pp.293 302, 2012

Query by example The implementation: SQL_RETRIEVE_QUERY_BY_USER_LABEL=""" WITH Y = Image dataset Offline dictionary extraction Pre-processing - Convert to HSV space, to string Dictionary Database user_dict_size AS ( SELECT uls.id AS id, uls.label as label, COUNT(*) AS cnt FROM user_labels AS uls, user_dictionaries AS uds WHERE uls.label LIKE '%(user_label_label)s' AND uls.id=uds.user_label_id GROUP BY uls.id, uls.label ), inter_size AS ( Compute distances FCD(x,Y) D( x) FCD( x, y) = D( x) ( D( x), D( y) ) -- size of dictionary intersection for all pairs of patches SELECT FROM ud1.user_label_id AS user_label_1,. Extract dictionary D(x) Rank according similarity x = Query image To present the retrieved images 19

Semantic annotation Methodology: PF algorithm Classification SVM with RF Annotated category Semantic Patches Collections Ground truth Optimal parameters: product type (MGD), mode (High resolution Spotlight), geometric resolution configurations (RE), patch size (160 x 160 pixels); PF algorithm (Gabor filters)

KDD, Active Learning and Auto-annotation KDD: interactive search supported by relevance feedback mechanisms. The output is the desired product, the related products (containing similar information), semantically explained, or new, previously unknown information learn the targeted image Query image category as accurately and as exhaustively as possible minimize the number of iterations in the relevance feedback loop sys user

Warning: not all categories can be learned!

Agenda (2/2) Dec.

Interpretation and Understanding Query, Data Mining & Knowledge Discovery Data Sources Data Model Generation DBMS Visual Data Mining Users Rapid Mapping generation

Functions 1 Exploration & browsing: fast and controlled by Relevance Feedback 2 Search: fast reaching the target 3 Category learning, image grouping and annotation: fast and well controlled by Relevance Feedback and Active Learning

Interpretation and Understanding Semantic classes Image patches Image visualization

Damage Assessment using TerraSAR-X images Bridges before Tsunami Bridges after Tsunami Debris caused by Tsunami 600 500 400 300 200 100 after 0 before Flooded areas Agriculture Ocean Bridges after Bridges before Flooded areas High voltage poles Flooded areas Mountains Ocean Flooded areas Structures Changes 504 20 0 8 12 75 16 2 Debris

Visual Data Mining Query, Data Mining & Knowledge Discovery Representation of the data in the 3D space using a Laplacian eigenmap Data Sources Data Model Generation DBMS Visual Data Mining Users Interpretation and Understanding

Visual Data Mining Interactive exploration and analysis of very large, high complexity, and non-visual data sets. The analyst can navigate in a multi-modal space and interactively search and refine relevance criteria to explore patterns of interest in the data, and perform their in-depth analysis. Provides example views (projections) of the entire archive

Visual Data Mining Visual Data Mining Representation of the data in the 3D space using a Laplacian eigenmap Semantically consistent groupings appear inside the data

32 Visual Data Mining Image Archive View Ontology and Knowledge Graph Jain IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 2, FEBRUARY 2008

Second Part Hands on evaluation of the system 3rd User Community Workshop Chania, 12-14 June 2013 Presenter: Mihai Datcu and Daniela Espinoza Molina (DLR)

Outline Description of the data set Optical images TerraSAR-X images Other Instructions of how to use the system Description of main functions Organization of the work groups Practical exercises

Example I: L Aquila Type Image Quickbird over L Aquila (Italy) Number of patches 20,880 Patch size 100 Overlapping Features - Weber local d (WLD) - Color histogram Semantic classes i.e. Forest : dense forest, sparse forest Mountains Agriculture fields Urban areas: dense u.a, high buildings

Example II: Different type of forest Type Image Ikonos panchromatic over Germany Number of patches 58,208 Patch size Overlapping Features Semantic classes i.e. 100x100 WLD Roads Houses Agriculture Forest I Forest II Forest III Water

Example III: Romania Type Image Airborne Number of patches P1: 52,500 P2: 52,500 Patch size Overlapping Features Semantic classes i.e. 100x100 WLD Color histogram River Cities Agriculture, etc

Example IV: Tsunami Type Image Number of patches Patch size Overlapping Features Semantic classes i.e. TerraSAR-X GEC over Sendai (Japan) 41,994 100x100 Weber local descriptor Flooding area, High buildings Mountains Ocean Residence area, etc.

Example IV: Medical images Type Image optical Number of patches 4,389 Patch size Overlapping Features 80x80 WLD Semantic classes i.e. Class 1 Class 2 Class 3

Example VI: Lucas Type Image Number of patches 93220 Patch size Overlapping Features Semantic classes i.e. In-situ photos 1600x1200 Weber Local descriptor

Semantic definition scenario 1. Create a new project: Select file/new and fill the required information on the dialog. Press OK and wait until the full image is loaded loading

Semantic definition scenario 2. Browsing the image: Explore the image content playing with the display functionalities using the mouse ( zoom in, zoom out, etc). Try to identify how many different classes can be annotated? Possible classes: mountains, cities: high buildings, residential area, ocean, flood area, etc.

Semantic definition scenario 3. Iterative training: Start the iterative/interactive training by giving positive examples (leftclick) and negative examples (right-click) on the image content with the class that you want to define. classification Negative examples Positive examples When you are ready, press to button classification

Semantic definition scenario 3.1 Improve the classification: the system displays in blue the areas classified. Navigate in the image looking for false alarms or miss-classifications. Repeat this process (give more positive or negative examples) until you are satisfied with the results. First iteration You can use also these patches

Semantic definition scenario 3.2 Save the class: When you press the button the system will ask you to give a name to the new class. Enter a name and then press OK. When you are ready, press to button give a name to the class

Semantic definition scenario 4. Visualizing the defined classes: Display the classes using the button Visualization

Semantic definition scenario 5. Play with the other functionalities: As for examples show the unclassified patches, etc. Don t forget to SAVE

Define your own classes 1. Definite semantic classes for medical image 2. Define semantic classes for L Aquila image 3. Define semantic classes for TerraSAR-X

Steps 1. Create a Project 2. Browse the image file 1. Zoom in, Zoom out 2. Try to identify how many different classes can be annotated? 3. Start the iterative/interactive training 1. Give good positive examples (Left-click) 2. Give negative examples (right-click) 4. Perform the classification. When the class is ok then save the new class 5. Visualize the annotated class 6. Delete the false alarms / improve your classification 7. Show the un- classified patches 8. SAVE the project.