COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D Opinion Mining Tools 1st version

Similar documents
COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D2.4.1

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D2.3. Classification Multi-lingual Corpus

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D2.2. Multilingual Content Analysis Scheme

Deliverable D Integrated Platform

D2.5 Data mediation. Project: ROADIDEA

Deliverable D Visualization and analytical tools

D8.2 Project website, blog and twitter channels

EVACUATE PROJECT WEBSITE

XML-Publishing Implementation Strategy of an XML-based publishing in Eurostat

Multilingual Ontologies for Networked Knowledge

DELIVERABLE. D3.1 - TransformingTransport Website. TT Project Title. Project Acronym

PRELIDA. D2.3 Deployment of the online infrastructure

D Smart Lighting System Final Prototype. Project number: Project acronym: TClouds

The Topic Specific Search Engine

SUMAT: An Online Service for SUbtitling by MAchine Translation

D5.3.1 Process Optimization (Prototype I)

D 9.1 Project web site. Project acronym: CITI-SENSE EU FP7- ENV Grant Agreement No.:

Participant Packet. Quality Attributes Workshop. Bates College Integrated Knowledge Environment

Development of Contents Management System Based on Light-Weight Ontology

European Interoperability Reference Architecture (EIRA) overview

Web Data Extraction and Generating Mashup

What Do You Mean It Doesn t Make Sense? Redesigning Finding Aids from the Users Perspective

ESPRIT Project N Work package I. Software Status Report (DI4.1 Part 1)

D5.1.1: DRUPAL MT TRAINING MODULE

Designing a System Engineering Environment in a structured way

Deliverable 7.1 M3 m-resist website

The Black Magic of Flash SEO

Title: Artificial Intelligence: an illustration of one approach.

Procedures and Resources Plan

Web Portal : Complete ontology and portal

Collective Intelligence in Action

Share.TEC System Architecture

Object Orientated Analysis and Design. Benjamin Kenwright

PLAYBOOK FOR BUSINESS STORE PAGES. Shopyourway.com presents: The beginning of serious commerce Social Commerce.

Computer-Aided Semantic Annotation of Multimedia. Project Presentation. June 2008

FP Project co-funded by the European Commission under the Information and Communication Technologies (ICT) 7 th Framework Programme

TaaS Terminology as a Service Project no Deliverable D4.2 Web interface for Terminology services for language workers

D5.1 Definition of System Architecture

Week 2 Unit 3: Creating a JDBC Application. January, 2015

Environmental Information Portals, Services, and Retrieval Systems

D WSMO Data Grounding Component

Living Transformation: ISAF-SEAS Case Study for Urgent Computing

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Ref. Ares(2015) /12/2015. D9.1 Project Collaborative Workspace Bénédicte Ferreira, IT

Deliverable 4.6 Architecture Specification and Mock-up System

H2020-ICT

PROJECT PERIODIC REPORT

Deliverable D2.8. Paving the Way for Future Emerging DNA-based Technologies: Computer-Aided Design and Manufacturing of DNA libraries

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities

Please be aware that not every step is necessary for your website, or it may be outside the scope of our agreed-upon deliverables.

Social Business Intelligence in Action

SLHC-PP DELIVERABLE REPORT EU DELIVERABLE: Document identifier: SLHC-PP-D v1.1. End of Month 03 (June 2008) 30/06/2008

A Knowledge-based Design Advisory System for Collaborative Design for MicroManufacturing

Co-creation for Success

Workpackage WP 33: Deliverable D33.6: Documentation of the New DBE Web Presence

Winery A Modeling Tool for TOSCA-Based Cloud Applications

Description of CORE Implementation in Java

IRMOS Newsletter. Issue N 5 / January Editorial. In this issue... Dear Reader, Editorial p.1

Microsoft vision for a new era

Developing an Automatic Metadata Harvesting and Generation System for a Continuing Education Repository: A Pilot Study

A SAS/AF Application for Parallel Extraction, Transformation, and Scoring of a Very Large Database

DotNetNuke. Easy to Use Extensible Highly Scalable

Community edition(open-source) Enterprise edition

CHAPTER 4 METHODOLOGY AND TOOLS

D3.2.1: OKAPI OCELOT

Deliverable achieved: The project website is now live and can be found at:

Notation Standards for TOGAF:

ISSN: Page 74

SAS Factory Miner 14.2: User s Guide

James A. Lumley, Research IT, Eli Lilly. Deploying KNIME to the Enterprise Reshaping Data & Architecture for Healthcare

The Data Mining usage in Production System Management

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment

Applications of the Embedded In-Memory Technology in the Public Sector

VISUALIZATIONS AND INFORMATION WIDGETS

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

D7.1 Website and Fact Sheet

Cisco Service Control Business Intelligence Solution Guide,

Parts of Speech, Named Entity Recognizer

D9.3 STREETLIFE Website

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1

D10: Detailed Specification of SODIUM Runtime Environment

Call: SharePoint 2013 Course Content:35-40hours Course Outline

TDWG Website Preview Guide

DEVELOPING MICROSOFT SHAREPOINT SERVER 2013 ADVANCED SOLUTIONS. Course: 20489A; Duration: 5 Days; Instructor-led

Advanced Digital Marketing Course

SHAREPOINT-2016 Syllabus

DATA MINING II - 1DL460. Spring 2014"

Deliverable D Multilingual corpus acquisition software

Oracle Fusion Applications

INSPIRE status report

Generic Business Model Types for Enterprise Mashup Intermediaries

ECSEL Research and Innovation actions (RIA) AMASS

Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2)

Collective Awareness Platform for Tropospheric Ozone Pollution

A GEOGRAPHICAL LOCATION INFLUENCED PAGE RANKING TECHNIQUE FOR INFORMATION RETRIEVAL IN SEARCH ENGINE

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

WP6: The European Raw Materials Intelligence Capacity Platform (EU-RMICP)

Enterprise Data Catalog for Microsoft Azure Tutorial

Coordinating Optimisation of Complex Industrial Processes

Transcription:

COCKPIT FP7-248222 Citizens Collaboration and Co-Creation in Public Service Delivery Deliverable D2.1.1 Opinion Mining Tools 1st version Editor(s): Responsible Partner: Kostas Giannakakis ATC, INTRASOFT International S.A. Status-Version: Final v0.4 Date: 30/06/2011 EC Distribution: Restricted to other programme participants (including the Commission Services) Page 1 of 13

Project Number: Project Title: FP7-248222 COCKPIT Title of Deliverable: Opinion Mining Tools 1st version Date of Delivery to the EC: 30/06/2011 Workpackage responsible for the Deliverable: WP2 Citizens Opinion Mining and Deliberation Editor(s): Kostas Giannakakis (ATC) Contributor(s): ATC, INTRASOFT Reviewer(s): Ivan Ficano (ENG) Approved by: All Partners Abstract: This report forms a complementary documentation to the source code files that are delivered as part of this Deliverable which is of type Prototype. It briefly describes the Opinion Mining software environment (1 st version) for performing Web 2.0 content collection and sentiment analysis. Keyword List: opinion mining, Web 2.0, crawler, sentiment analysis Page 2 of 13

Document Description Document Revision History Version Date Modification Reason Modifications Introduced Modified by v0.1 20/06/2011 v0.2 28/06/2011 v0.3 28/06/2011 v0.4 30/06/2011 Documentation of the report that is accompanied to the source code files supplied in the context of this deliverable that is of type Prototype. Delivery of final version of source code files. Internal review of documentation and verification of supplied source code files. Provision of comments Preparation of final version. Address of reviewing comments and final formatting. ATC INTRASOFT, ATC ENG INTRASOFT Page 3 of 13

Table of Contents EXECUTIVE SUMMARY... 5 1 INTRODUCTION... 6 1.1 PURPOSE AND SCOPE... 6 1.2 STRUCTURE OF THE DOCUMENT... 6 2 SOFTWARE ARCHITECTURE... 7 3 OPINION MINING PROCESS... 8 3.1 TRAINING PHASE... 9 3.2 RUNTIME PHASE... 10 4 SOFTWARE MODULES... 11 4.1 CRAWLER... 11 4.2 ANALYSIS SERVICE... 11 4.3 OPINION MINING WEB TOOL... 11 4.4 TRAINING AND CLASSIFICATION COMPONENT... 12 4.5 REST SERVICES INTERFACE... 12 5 DATABASES... 13 5.1 CRAWLER DATABASE... 13 5.2 SERVICES DATABASE... 13 Table of Figures FIGURE 1: OPINION MINING TOOLS AND DELIBERATIVE ENGAGEMENT PLATFORM ARCHITECTURE... 7 FIGURE 1: OPINION MINING CONCEPTUAL ARCHITECTURE... 8 FIGURE 3: OPINION MINING WEB TOOL... 9 Page 4 of 13

Executive Summary This is a supplemenatry document that describes all the material needed for the realisation of the 1 st version of the Opinion Mining software environment for performing Web 2.0 content collection and sentiment analysis. A second and final release of this software component will be provided as part of Deliverable D2.1.2 before the initiation of the 2 nd cycle of the piloting phase in M29. Page 5 of 13

1 Introduction 1.1 Purpose and Scope With respect to the overall WP2 progress plan, the document at hand describes the Opinion Mining software environment for performing Web 2.0 content collection and sentiment analysis. 1.2 Structure of the Document This documentation consists of the following sections: Section 2 describes the combined architecture of the Opinion Mining Tools and the Citizens Deliberative Engagement Platform. Section 3 describes the opinion mining process. The components of the system are orchestrated in a complex workflow that assist users to collect data relevant to a service, build a trained model and then use it to classify new opinions. Section 4 describes in detail the software modules of the Opinion Mining Tools. Section 5 describes the database schemas used by the tools. Page 6 of 13

2 Software Architecture The Opinion Mining Tools are part of a broader system that also includes the Citizens Deliberative Engagement Platform. These two components are tightly interconnected and share common resources. Their combined architecture is depicted in the figure below. Crawler Analysis Service Training and Classification Crawler DB (MySQL) Services DB (MSSQL) Deliberation Platform REST Services Interface Opinion Mining Web Tool DotNetNuke DB (MSSQL) Figure 1: Opinion Mining Tools and Deliberative Engagement Platform Architecture The Opinion Mining Tools consist of the following components: Crawler Analysis Service Opinion Mining Web Tool Training and Classification Component REST Services Interface,utilising the following databases: Crawler Database (MySQL): This is the database that holds the text crawled from the web. For each service there is a different crawler database employed. Services Database (MSSQL): This is a central database that maintains Page 7 of 13

service specific information such as crawled documents, analysis data, training and classification information, service description, associated polls, votes and comments. 3 Opinion Mining Process The following figure depicts the conceptual architecture of the Opinion Mining System. Figure 2: Opinion Mining conceptual architecture The Opinion Mining process is performed in two phases: Training phase: takes as input a list of relevant URLs and the service ontology, and delivers a trained model. Runtime phase: takes as input a list of relevant URLs and the trained model, and constantly monitors the Web 2.0 sources for new opinions, which are automatically classified to as positive or negative. The results are delivered through a web tool and/or a RESTful services interface. Both of those processes are described briefly in the following sub-sections. Page 8 of 13

3.1 Training phase The training phase starts with the COCKPIT Crawler being configured to collect documents from URLs relevant to a service. For each service, a different instance of the COCKPIT Crawler is set up. The collected documents are analysed and a score is attached to them. This score is calculated based on the number of adjectives and the ontology terms found in the text, and gives an indication whether they present an opinion regarding the service in question. After a considerable amount of documents has been collected, the users are asked to go through them and select at least 100 negative and positive opinions out of them. The selection is aided by the Opinion Mining Web Tool, which presents all collected documents is a tabular form and can sort them based on their score. Figure 3: Opinion Mining Web Tool When an adequate number of opinions are collected, the web tool allows the user to create the trained model. The trained model is created with Rapid Miner and is Page 9 of 13

stored, so that it can later be used in the running phase. 3.2 Runtime phase The running phase utilises the trained model created during the training phase in order to automatically classify new opinions. The crawler service is still active and collecting new documents. A classification service uses Rapid Miner to classify the new opinions and store the results in a database. A user can monitor the results through the Opinion Mining Web Tool. The following information is offered: Total number of documents referring to a service collected Percentage of positive opinions Percentage of negative opinions A time range of interest can be defined (start and stop date). Also, the text of each individual document can be reviewed. The results are also made available through the RESTful Web Services Interface. Page 10 of 13

4 Software Modules 4.1 Crawler Type Description Source Code Java Service The crawler is based on the open-source sync3crawler. The original crawler has been extended in order to better support multi-lingual texts. The crawler accepts as input a list of RSS feeds and constantly monitors them for new data. When new data are found the crawler fetches the content, cleans the HTLM tags (using boilerpipe library) and stores them into a MySQL database. For each service a different instance of the crawler and the database are deployed. CockpitCrawler.zip 4.2 Analysis Service Type Description Source Code.NET Service This is a service that runs periodically and collects new data from the crawler databases. The new data are analyzed and inserted into the Services Database. The data analysis involves searching for relevant terms of a service and adjectives inside the cleaned text. This information is utilized by the Opinion Mining Web Tool in order to provide suggestions to the user during the training phase. AnalysisService.zip 4.3 Opinion Mining Web Tool Type Description ASP.NET Web Application The Opinion Mining Web Tool is a Web Application that is integrated into the SE tool and assists users to: Configure the Opinion Mining Tool by providing the service related terms and URLs. Select negative and positive opinions from the crawled documents. The Opinion Mining Web Tool assists the users to perform this task by providing suggestions. Initiate the Training and Classification phases Page 11 of 13

Monitor the results of the Classification phase Source Code The Web Tool for a specific service is available at http://paris.atc.gr/opinionmining/?service=1 OpinionMining.zip 4.4 Training and Classification Component Type RapidMiner Application Description The Training and Classification Component is based on RapidMiner. In the Training Phase a set of negative and positive opinions that were selected by the users are read from the Services Database and a classification model is created. The classification model is used during the Classification Phase in order to provide polarity predictions for the new documents inserted into the Services Database. Source Code TrainingClassification.zip 4.5 REST Services Interface Type Description ASP.NET Web Application This application implements the REST Services Interface for the Deliberation Platform and Opinion Mining Tool. This interface interconnects the Opinion Mining Tool and the Deliberation Platform with other components of the COCKPIT System. The home page of the services is available at http://paris.atc.gr/cockpit-services Source Code CockpitServices.zip Page 12 of 13

5 Databases 5.1 Crawler Database Description Schema MySQL This is the database that holds the text crawled from the web. For each service there is a different crawler database instance deployed. CockPitCrawler.sql 5.2 Services Database Description Schema Microsoft SQL Server This is a central database that maintains all information including the services. For each service the corresponding crawled documents, analysis, training and classification information, service description, associated poll, votes and comments are stored. Cockpit.sql Page 13 of 13