[Demo] A webtool for analyzing land-use planning documents

Similar documents
Setup of epiphytic assistance systems with SEPIA

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

YAM++ : A multi-strategy based approach for Ontology matching task

QAKiS: an Open Domain QA System based on Relational Patterns

Simulations of VANET Scenarios with OPNET and SUMO

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

DANCer: Dynamic Attributed Network with Community Structure Generator

HySCaS: Hybrid Stereoscopic Calibration Software

Natural Language Based User Interface for On-Demand Service Composition

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Assisted Policy Management for SPARQL Endpoints Access Control

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

Catalogue of architectural patterns characterized by constraint components, Version 1.0

XML Document Classification using SVM

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Multimedia CTI Services for Telecommunication Systems

LaHC at CLEF 2015 SBS Lab

XBenchMatch: a Benchmark for XML Schema Matching Tools

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Service Reconfiguration in the DANAH Assistive System

How to simulate a volume-controlled flooding with mathematical morphology operators?

Comparison of spatial indexes

ANALEC: a New Tool for the Dynamic Annotation of Textual Data

Synthesis of fixed-point programs: the case of matrix multiplication

Change Detection System for the Maintenance of Automated Testing

Semantic Label and Structure Model based Approach for Entity Recognition in Database Context

Fuzzy sensor for the perception of colour

Scan chain encryption in Test Standards

Taking Benefit from the User Density in Large Cities for Delivering SMS

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

Technical documents classification

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Relabeling nodes according to the structure of the graph

Spectral Active Clustering of Remote Sensing Images

Linked data from your pocket: The Android RDFContentProvider

Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification

Robust IP and UDP-lite header recovery for packetized multimedia transmission

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

The Connectivity Order of Links

Fueling Time Machine: Information Extraction from Retro-Digitised Address Directories

Mokka, main guidelines and future

Syrtis: New Perspectives for Semantic Web Adoption

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Computing Fine-grained Semantic Annotations of Texts

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Formal modelling of ontologies within Event-B

Mapping classifications and linking related classes through SciGator, a DDC-based browsing library interface

SIM-Mee - Mobilizing your social network

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Linux: Understanding Process-Level Power Consumption

A N-dimensional Stochastic Control Algorithm for Electricity Asset Management on PC cluster and Blue Gene Supercomputer

Malware models for network and service management

A 64-Kbytes ITTAGE indirect branch predictor

Every 3-connected, essentially 11-connected line graph is hamiltonian

Deformetrica: a software for statistical analysis of anatomical shapes

Tacked Link List - An Improved Linked List for Advance Resource Reservation

Very Tight Coupling between LTE and WiFi: a Practical Analysis

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

Representation of Finite Games as Network Congestion Games

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

The optimal routing of augmented cubes.

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

A Voronoi-Based Hybrid Meshing Method

Kinematic skeleton extraction based on motion boundaries for 3D dynamic meshes

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

The New Territory of Lightweight Security in a Cloud Computing Environment

LIG and LIRIS at TRECVID 2008: High Level Feature Extraction and Collaborative Annotation

QuickRanking: Fast Algorithm For Sorting And Ranking Data

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

An Experimental Assessment of the 2D Visibility Complex

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Comparison of radiosity and ray-tracing methods for coupled rooms

RETIN AL: An Active Learning Strategy for Image Category Retrieval

SDLS: a Matlab package for solving conic least-squares problems

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

Self-optimisation using runtime code generation for Wireless Sensor Networks Internet-of-Things

FStream: a decentralized and social music streamer

Graphe-Based Rules For XML Data Conversion to OWL Ontology

Comparator: A Tool for Quantifying Behavioural Compatibility

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

Hierarchical Multi-Views Software Architecture

Traffic Grooming in Bidirectional WDM Ring Networks

Sewelis: Exploring and Editing an RDF Base in an Expressive and Interactive Way

A case-based reasoning approach for invoice structure extraction

Cultural micro-blog Contextualization 2016 Workshop Overview: data and pilot tasks

DSM GENERATION FROM STEREOSCOPIC IMAGERY FOR DAMAGE MAPPING, APPLICATION ON THE TOHOKU TSUNAMI

A case-based reasoning approach for unknown class invoice processing

Privacy-preserving carpooling

A Methodology for Improving Software Design Lifecycle in Embedded Control Systems

Lossless and Lossy Minimal Redundancy Pyramidal Decomposition for Scalable Image Compression Technique

Prototype Selection Methods for On-line HWR

Light field video dataset captured by a R8 Raytrix camera (with disparity maps)

Aligning Legivoc Legal Vocabularies by Crowdsourcing

Sliding HyperLogLog: Estimating cardinality in a data stream

From medical imaging to numerical simulations

Moveability and Collision Analysis for Fully-Parallel Manipulators

Structuring the First Steps of Requirements Elicitation

Transcription:

[Demo] A webtool for analyzing land-use planning documents M.A. Farvardin, Eric Kergosien, Mathieu Roche, Maguelonne Teisseire To cite this version: M.A. Farvardin, Eric Kergosien, Mathieu Roche, Maguelonne Teisseire. [Demo] A webtool for analyzing land-use planning documents. ISWC 2015, Oct 2015, Bethlehem, United States. 4 p., 2015, proceedings of the ISWC 2015. <hal-01320970> HAL Id: hal-01320970 https://hal.archives-ouvertes.fr/hal-01320970 Submitted on 24 May 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

[Demo] A webtool for analyzing land-use planning documents Mohammad Amin Farvardin 1,2, Eric Kergosien 2, Mathieu Roche 1,3, and Maguelonne Teisseire 1,3 1 UMR TETIS (Irstea, Cirad, AgroParisTech), Montpellier, France amin.farvardin@teledetection.fr mathieu.roche@cirad.fr maguelonne.teisseire@teledetection.fr 2 GERiiCO, Univ. Lille 3, France eric.kergosien@univ-lille3.fr 3 LIRMM, CNRS, Univ. Montpellier, France Abstract. In previous work, different methods have been proposed in order to semi-automatically mine geospatial information and opinions in documents [3]. In this paper, we present the Web application, SentiAnnotator, based on NLP methods to extract and visualize geospatial information with the associated entities. The evaluation of our application shows good results on a French corpus, i.e. F-measure of 0.74 and 0.75 respectively for the identification of spatial features and organizations. Keywords: Land-use planning, Web application, Geospatial features 1 Introduction Researchers and experts of land-use planning are looking for decisional tools for helping them to have an overview of user s awareness on territories. In this context, we defined the Opiland method that enables to semi-automatically analyze sentiments related to land-use planning documents [3]. In this paper, we present the developed software and the associated web services for discovering and for integrating meaning in free texts available on the Web. This kind of textual data (e.g. blogs, newspapers, and so on) is generally complex but useful for public policy dialogue and decision-making. We thus propose an approach that enables (i) to automatically extract features related to land-use planning, and (ii) to give to experts the possibility of evaluating sentiments related to geospatial features, with the ultimate objective of evaluating the policy impact for adapting their decisions. The main originality of our software concerns the integration of different levels of semantics present in a document. This is really crucial in order to improve the analysis of information, specially for land-use planning domain. Generally in the opinion mining field, the connection between opinion and topic is studied. Actually in the land-use planning domain, it is necessary to take into account a larger number of relevant elements like spatial

2 Farvardin et al. features and organizations. Our software offers this possibility for helping the experts to do a finer analysis of Web data. In this demo paper, we present the SentiAnnotator web application 4 to extract different features related to land-use planning. Hereafter, in Section 2, we focus more precisely on the deployment of natural language processing methods to extract geospatial information, i.e. spatial features and organizations. The web application for uploading, indexing, and marking textual documents is detailed in Section 3. Finally, after a quick look to our system evaluation in Section 4, future work related to our project is drawn in Section 5. 2 Geospatial information extraction Named Entity Recognition (NER) methods identify different types of Named Entities (NE): dates, people, organizations, themes, numeric values, as well as locations. There is a significant number of available systems, such as OpenNLP 5, OpenCalais 6, and CasEN [6]. To recognize Named Entities several approaches are based on supervised learning methods. In this context, a bag-of-words representation is often used [1]. But this kind of statistical approach is not adapted for small data sets we are faced with in the land-use planning domain. Other approaches based on symbolic methods concern geoparsing [2, 4, 5]. The work of [5] proposes linguistic patterns to extract Spatial Features (SF) from texts. These patterns are based on a cognitive model where SF is composed of at least one NE and one variable number of spatial indicators specifying its location. Five spatial relation types are considered: orientation, distance, adjacency, inclusion, and geometric which defines union or intersection linking two SF. In our proposal, we add new patterns to improve the automatic identification of SF (absolute spatial features (A SF) and relative spatial features (R SF) [3, 5]). The SF annotation is based on the classical typology of the domain and more precisely on the sub-types of locations. Locations can be polysemous: human constructions (e.g. buildings) and addresses (e.g. streets). To take into account all these language specificities, some rules (patterns) have been added. Moreover we propose a new type of patterns to identify Organizations (OE) which is a specific NE useful for land-use planning domain. The addition of specific rules enables to identify OE which could be confused with SF in documents. Such rules are: (1) an OE is followed by an action verb; (2) an OE is proceeded by prepositions: with, by, for, on behalf of, etc. In order to manage these geospatial information, we developed the web application SentiAnnotator (http://siso.teledetection.fr/viewer.jsp). A screenshot is presented Figure 1. The web services use the Gate system 7. After uploading a corpus (in French for this current version), Spatial Features and Organizations are extracted using the implemented rules. Moreover other concepts are 4 http://siso.teledetection.fr/ 5 https://opennlp.apache.org/ 6 http://www.opencalais.com/ 7 https://gate.ac.uk/

A webtool for analyzing land-use planning documents 3 extracted using Gate: (i) thematics based on a lexicon using Agrovoc thesaurus 8, (ii) Opinions related to land-use planning domain [3]. 3 The SentiAnnotator Web application The web application (Figure 1) allows users to upload corpora, to index documents with specific web services in order to mark different kinds of information (spatial features, organizations, opinions, and themes), to visualize, to correct the results, and to download validated results in XML format. More specifically, it is possible to upload corpora (frame 1), each marked corpus is saved on the server and automatically available in the web application (frame 2). After having downloaded documents, users can select the marked features (frame 5), see the results on the selected documents in frame 3. In this frame, spatial features are in blue color, organizations in purple color, the positive opinions in green color, negative ones in red color and neutral in yellow color. By selecting different categories from frame 5, the related marked information will be kindles in frame 3 and listed by type in frame 4. In case of finding any mistakes, users can unselect marked information (frame 4). Finally expert can export the selected corrected documents by clicking the top right bottom. The downloaded corpus consists of selected documents with the marked information except those were removed by the user. The administration page allows users to upload, edit, and delete pipelines defined in the Gate format. It is also possible to remove processed corpora and to edit the uploaded pipeline rules and the available lexicons. 8 http://aims.fao.org/fr/agrovoc Fig. 1. SentiAnnotator Web application

4 Farvardin et al. 4 Evaluation Three experts of the project evaluated the process for extracting geospatial information by using SentiAnnotator application. We use a French corpus composed of 4328 words (71 spatial features and 117 organizations). The evaluations (with classical measure, i.e. Precision, Recall, and F-measure) have been investigated by comparing the manual extraction done by experts with the web service results. For SF, we obtain an excellent recall (0.91) and an acceptable precision (0.62), the F-measure is 0.74. We extract the great majority of SF but the rules still return some errors. The rules to identify OE are very efficient and return high precision (0.85) but the value of recall is lower (0.67). The F-measure for organization identification is 0.74. The rules for organization extraction seem well-adapted to the domain but they have to be extended in order to improve the recall that remains low. 5 Conclusion and Future Work In this paper, we have presented a Web application called SentiAnnotator including web services (1) to annotate corpora with features related to landuse planning, and (2) to evaluate achieved approaches with experts. Experts are using this tool for analyzing the construction project of a road around Villeveyrac (France). Future work will be dedicated to the improvements of the defined linguistics patterns for discovering NE in order to tackle the issues related to the land-use planning specificities and the multilingual aspects. We also plan to extend our approach to different types of textual contents such as tweets. Acknowledgments: The authors thank Midi Libre (French newspaper) for its expertise on the corpus and all partners of the Senterritoire project for their involvement (MSH-M, Geosud Equipex, Numev Labex, and Tectoniq PEPS project). References 1. X. Carreras, L. Marquez, and L. S. Padró. A simple named entity extractor using adaboost. In In Proceedings of CoNLL-2003, pages 152 155, 2003. 2. M. Gaio and V. Nguyen. Towards heterogeneous resources-based ambiguity reduction of sub-typed geographic named entities. In Int. Conf. of GeoSpatial Semantics, pages 217 234, 2011. 3. E. Kergosien, B. Laval, M. Roche, and M. Teisseire. Are opinions expressed in land-use planning documents? International Journal of Geographical Information Science, 28(4):739 762, 2014. 4. J. L. Leidner and M. D. Lieberman. Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Special, 3(2):5 11, July 2011. 5. J. Lesbegueries, C. Sallaberry, and M. Gaio. Associating spatial patterns to textunits for summarizing geographic information. In Proceedings of ACM SIGIR 2006. Geographic Information Retrieval, Workshop, pages 40 43, 2006. 6. D. Maurel, N. Friburger, J.-Y. Antoine, I. Eshkol-Taravella, and D. Nouvel. Casen: a transducer cascade to recognize french named entities. TAL, 52(1):69 96, 2011.