Computer Support for the Analysis and Improvement of the Readability of IT-related Texts

Size: px
Start display at page:

Download "Computer Support for the Analysis and Improvement of the Readability of IT-related Texts"

Transcription

1 Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf, , Munich Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de

2 Administrative Setup Chair: Software Engineering for Business Information Systems Company: QAWare GmbH Title: Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Advisor: Bernhard Waltl Andreas Zitzelsberger Author: Matthias Holdorf Start: 15. May 2016 Submission: 15. November 2016 Matthias Holdorf sebis 2

3 Motivation Die Probe der Güte ist, dass der Leser nicht zurückzulesen hat. The sample of kindness is, that the reader does not have to read back. Jean Paul Matthias Holdorf sebis 3

4 Research Approach: Design Science Approach [1], [2] (1) Identify Business Problems Guided and expert interviews Show mock-ups early (2) Develop a Design for an Artefact Create evaluation criteria (3) Design and Implementation Strong focus on executable artefact Demonstrate artefact (4) Evaluation Guided and expert interviews Matthias Holdorf sebis 4

5 Process Model of Editing a Text Activity Write Editing Incorporate Publish Role Author Editor Author Quality Assurance Text Corpus: IT-related Texts Format: Word, Markup System Software Support SVN SVN Matthias Holdorf sebis 5

6 Readability Formulas on Document Level [3] FRE = 206,835 1,015 Total Words Total Sentences 84.6 Total Syllables Total Words Score Readability Example 0 30 Very difficult Academic difficult Fairly difficult Mediocre year old Students Fairly easy Easy Very easy 11-year old Students Matthias Holdorf sebis 6

7 Readability Formulas on Sentence Level I [4] Matthias Holdorf sebis 7

8 Readability Formulas on Sentence Level II [5] Hamburger Verständlichkeitskonzept Matthias Holdorf sebis 8

9 Readability Formulas on Sentence Level III [6] Apache UIMA Ruta A language for rule-based text annotations. Matthias Holdorf sebis 9

10 Sample of edited Sentences (Mock-up) A possible representation of improvements of the readability is presented during the interviews, after the interviewee gave us feedback on how such improvement may be incorporate and visualized in a document. Matthias Holdorf sebis 10

11 Professional Approach I Lexical Analysis Morphological analysis Syntactic analysis Semantic analysis Words Part of Speech Grammatical relation Entities Syllables Root word Coreference Predictions Sentences Grammatical Case Coherence Meaning Matthias Holdorf sebis 11

12 Technical Approach I Lexical Analysis Morphological analysis Syntactic analysis Semantic analysis Tokenizer POS-Tagger Dependency Parser NER Segmenter Stemming Constituency Parser SRL Sentence Splitter Morph- Tagger Chunking Semantic Predicate Matthias Holdorf sebis 12

13 Professional Approach II Markup Document Importer Lexikalische Analyse Morphologische Analyse Syntaktische Analyse Semantische Analyse Analysis Engine Wörter Silben Wortart Wortstamm Syntaktisch e Funktionen Koreferenz Entitäten Vorhersage n Sätze Fall Zusammen - gehörigkeit en Bedeutung Rule Engine Exporter Annotated Document Metrics Matthias Holdorf sebis 13

14 Technical Approach II Markup Document Importer Analysis Engine Rule Engine Exporter Annotated Document Metrics Matthias Holdorf sebis 14

15 Challenge, Research Questions and Contributions How does the process model of a document being written, edited and published in an IT-Company looks like? Which actor has which problems at a certain state in the process model? How can we improve the readability of IT-related text? What readability formulas and pattern exist for the German language? How can we make the improvement available to the authors? What are the functional and non-functional requirements (evaluation criteria) of a software to support the analysis and the improvement of the Readability of IT- Related Text? How does a prototypical implementation enabling the analysis and improvement of the Readability of IT-related text look like? How can it be integrated in the workflow of an IT-Company? Matthias Holdorf sebis 15

16 Project Planning and Milestones Registration 15. May Start of work Kick-off Presentation QAWare Kick-off Presentation University Developing theoretical foundations 24. June Implementation and Integration finished 23. September Completion of work Submission 15. November 01. April 29. April 23. May 14. October March April May June July August September October November % 1. Preparatory Phase [ ] 3. Implementation [ ] 5. Evaluation and correction phase [ ] Literature research and demarcation Deeper familiarization with NLP Investigation of NLP architecture and libraries Define first structure of paper Implementation of the architecture Implementation of readability formula Demonstration of artefact Integration of the prototype Formulation in the thesis paper Correction of the thesis 2. Developing theoretical foundations [ ] Identify business problems (expert interviews) 4. Evaluation of the prototype [ ] Evaluation of the prototype (expert interviews) Literature research Adjustments of the prototype Develop a concept Formulation in the thesis paper Matthias Holdorf sebis 16

17 Matthias Holdorf B.Sc. Business Informatics Technische Universität München Department of Informatics Chair of Software Engineering for Business Information Systems Boltzmannstraße Garching bei München Tel matthias.holdorf@gmail.com wwwmatthes.in.tum.de

18 Bibliography [1] Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science ininformation systems research. MIS Quarterly, 28(1), [2] Hevner, A. (2015): Robust Processes of Design Science Research. [3] Flesch, R. (1948): A New Readability Yardstick. In: Journal of Applied Psychology 32(3), [4] Stanford CoreNLP (2016): A suite of core NLP tools. [5] Schneider, W. (2001): Deutsch für Profis, Wege zu gutem Stil. 23. Aufl., Wilhelm Goldmann Verlag, München, [6] Apache UIMA Ruta (2016): Rule-based Text Annotation. Matthias Holdorf sebis 18

19 Backup Matthias Holdorf sebis 19

20 Prototype Architecture I Importer Analysis Engine Rule Engine Tokenizer POS-Tagger Morph- Tagger Dependency -Parser Ruta- Pattern Ruta- Pattern <<Annotations>> <<Annotations>> <<Annotations>> <<Annotations>> Common Analysis Structure (CAS) Exporter Matthias Holdorf sebis 20

21 Prototype Architecture II Matthias Holdorf sebis 21

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf, 21.11.2016, Munich Software Engineering for Business Information Systems (sebis) Department of

More information

A Model-Driven JSON Editor

A Model-Driven JSON Editor A Model-Driven JSON Editor Lucas Köhler Master s Thesis Kickoff, 10.07.2017, Munich Advisors: Adrian Hernandez-Mendez, Dr. Jonas Helming Chair of Software Engineering for Business Information Systems (sebis)

More information

Bachelor s Thesis: Conceptualization and Implementation of a Rule-based Workbench for Textual Pattern Annotation

Bachelor s Thesis: Conceptualization and Implementation of a Rule-based Workbench for Textual Pattern Annotation Bachelor s Thesis: Conceptualization and Implementation of a Rule-based Workbench for Textual Pattern Annotation Georg Bonczek, 2017 Chair of Software Engineering for Business Information Systems (sebis)

More information

Master s Thesis Kickoff Semantic Analysis and Structuring of German Legal Documents using Named Entity Recognition and Disambiguation

Master s Thesis Kickoff Semantic Analysis and Structuring of German Legal Documents using Named Entity Recognition and Disambiguation Master s Thesis Kickoff Semantic Analysis and Structuring of German Legal Documents using Named Entity Recognition and Disambiguation Ingo Glaser, 10.04.2017 Chair of Software Engineering for Business

More information

Creating a Software Architecture Documentation for MediaWiki Software Master s Thesis Kick-Off Presentation , Uliana Bakhtina

Creating a Software Architecture Documentation for MediaWiki Software Master s Thesis Kick-Off Presentation , Uliana Bakhtina Creating a Software Architecture Documentation for MediaWiki Software Master s Thesis Kick-Off Presentation 23.03.2015, Uliana Bakhtina Software Engineering für betriebliche Informationssysteme (sebis)

More information

Platform-Independent UI Models: Extraction from UI Prototypes and rendering as W3C Web Components

Platform-Independent UI Models: Extraction from UI Prototypes and rendering as W3C Web Components Platform-Independent UI Models: Extraction from UI Prototypes and rendering as W3C Web Components Marvin Aulenbacher, 19.06.2017, Munich Chair of Software Engineering for Business Information Systems (sebis)

More information

Creating Software Architecture Documentation for MediaWiki Software Master s Thesis Final Presentation , Uliana Bakhtina

Creating Software Architecture Documentation for MediaWiki Software Master s Thesis Final Presentation , Uliana Bakhtina Creating Software Architecture Documentation for MediaWiki Software Master s Thesis Final Presentation 14.09.2015, Uliana Bakhtina Software Engineering für betriebliche Informationssysteme (sebis) Fakultät

More information

Guided Research: Intelligent Contextual Task Support for Mails

Guided Research: Intelligent Contextual Task Support for Mails Guided Research: Intelligent Contextual Task Support for Mails Simon Bönisch, 28.05.2018, Kick-off Presentation Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics

More information

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Master thesis - Kickoff presentation Michael Legenc Advisor: Daniel Braun Munich, 24.07.2017

More information

A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system

A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system A prototypical tool to discover architecture changes based on multiple monitoring data sources for a distributed system Patrick Schäfer, 08.11.2017, Munich Advisor: Martin Kleehaus Chair of Software Engineering

More information

Development of a Social Extension for Real-Time Communication in CAD Software

Development of a Social Extension for Real-Time Communication in CAD Software Development of a Social Extension for Real-Time Communication in CAD Software Markus Müller, 2.11.2015 (Bachelor s Thesis, final presentation) Software Engineering for Business Information Systems (sebis)

More information

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum.

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum. A concept for the design of learning resources for API of Content Management Platforms Guided research Kickoff Presentation Sirma Gjorgievska, 16.11.2015 Software Engineering for Business Information Systems

More information

Implementing a Web Client for Social Content and Task Management Master s Thesis Final Presentation , Björn Michelsen

Implementing a Web Client for Social Content and Task Management Master s Thesis Final Presentation , Björn Michelsen Implementing a Web Client for Social Content and Task Management Master s Thesis Final Presentation 10.10.2016, Björn Michelsen Software Engineering für betriebliche Informationssysteme (sebis) Fakultät

More information

Master thesis: Automatic Extraction of Design Decision Relationships from a Task Management System

Master thesis: Automatic Extraction of Design Decision Relationships from a Task Management System Master thesis: Automatic Extraction of Design Decision Relationships from a Task Management System Matthias Ruppel, 8 th of November 2017, Munich Chair of Software Engineering for Business Information

More information

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Master thesis Final presentation Michael Legenc Advisor: Daniel Braun Munich, 08.01.2018

More information

Design and Implementation of a Bikesharing Service as part of an open Mobility-Ecosystem Master Thesis - Final Presentation Weidner, Lucas

Design and Implementation of a Bikesharing Service as part of an open Mobility-Ecosystem Master Thesis - Final Presentation Weidner, Lucas Design and Implementation of a Bikesharing Service as part of an open Mobility-Ecosystem Master Thesis - Final Presentation Weidner, Lucas 21.11.2016 Software Engineering for Business Information Systems

More information

Technical Analysis of Established Blockchain Systems

Technical Analysis of Established Blockchain Systems Technical Analysis of Established Blockchain Systems Florian Haffke, 20.11.2017, Munich Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität

More information

Knowledge Extraction from German Automotive Software Requirements using NLP-Techniques and a Grammar-based Pattern Detection

Knowledge Extraction from German Automotive Software Requirements using NLP-Techniques and a Grammar-based Pattern Detection Knowledge Extraction from German Automotive Software s using NLP-Techniques and a Grammar-based Pattern Detection Mathias Schraps Software Development Audi Electronics Venture GmbH 85080 Gaimersheim, Germany

More information

Process and Tool-support to Collaboratively Formalize Statutory Texts by Executable Models

Process and Tool-support to Collaboratively Formalize Statutory Texts by Executable Models Process and Tool-support to Collaboratively Formalize Statutory Texts by Executable Models Bernhard Waltl, Thomas Reschenhofer, and Florian Matthes Software Engineering for Business Information Systems

More information

Final Presentation Master s Thesis: Identification of Programming Patterns in Solidity

Final Presentation Master s Thesis: Identification of Programming Patterns in Solidity Final Presentation Master s Thesis: Identification of Programming Patterns in Solidity Franz Volland, 04 th June 2018, Scientific advisor: Ulrich Gallersdörfer Chair of Software Engineering for Business

More information

Management of Complex Product Ontologies Using a Web-Based Natural Language Processing Interface

Management of Complex Product Ontologies Using a Web-Based Natural Language Processing Interface Management of Complex Product Ontologies Using a Web-Based Natural Language Processing Interface Master Thesis Final Presentation A B M Junaed, 11.07.2016 Software Engineering for Business Information

More information

An UIMA based Tool Suite for Semantic Text Processing

An UIMA based Tool Suite for Semantic Text Processing An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life

More information

Towards an EA View Template Marketplace

Towards an EA View Template Marketplace Towards an EA View Template Marketplace 29.06.2016, Prof. Dr. Florian Matthes Software Engineering für betriebliche Informationssysteme (sebis) Fakultät für Informatik Technische Universität München wwwmatthes.in.tum.de

More information

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum.

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany. wwwmatthes.in.tum. A Concept for the Design of Learning Resources for APIs of Content Management Platforms Guided research Final Presentation Sirma Gjorgievska, 23.05.2016 Software Engineering for Business Information Systems

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Department of Informatics TECHNISCHE UNIVERSITÄT MÜNCHEN Master s Thesis in Information Systems Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf

More information

Implementing a Variety of Linguistic Annotations

Implementing a Variety of Linguistic Annotations Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing

More information

REST-based Data Integration Services for Software Engineering Domain

REST-based Data Integration Services for Software Engineering Domain REST-based Data Integration Services for Software Engineering Domain Fridolin Koch, Bachelor s Thesis Final Presentation Software Engineering for Business Information Systems (sebis) Department of Informatics

More information

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

Developing A Code Readability Model To Improve Software Quality

Developing A Code Readability Model To Improve Software Quality Available online at www.interscience.in Developing A Code Readability Model To Improve Software Quality Venkatesh Podugu Department of CSE, JNTUA College of Engineering, Anantapur, Andhra Pradesh, India

More information

ClearTK 2.0: Design Patterns for Machine Learning in UIMA

ClearTK 2.0: Design Patterns for Machine Learning in UIMA ClearTK 2.0: Design Patterns for Machine Learning in UIMA Steven Bethard 1, Philip Ogren 2, Lee Becker 2 1 University of Alabama at Birmingham, Birmingham, AL, USA 2 University of Colorado at Boulder,

More information

Apache UIMA and Mayo ctakes

Apache UIMA and Mayo ctakes Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

Aid to spatial navigation within a UIMA annotation index

Aid to spatial navigation within a UIMA annotation index Aid to spatial navigation within a UIMA annotation index Nicolas Hernandez LINA CNRS UMR 6241 University de Nantes Darmstadt, 3rd UIMA@GSCL Workshop, September 23, 2013 N. Hernandez Spatial navigation

More information

Ortolang Tools : MarsaTag

Ortolang Tools : MarsaTag Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements

More information

W3C XG USDL Introduction

W3C XG USDL Introduction W3C XG USDL Introduction SYSTEMATIC THOUGHT LEADERSHIP FOR INNOVATIVE BUSINESS Dr. Kay Kadner, SAP AG 2011-11-19 Dr. Kay Kadner Senior Researcher I Project lead of W3C USDL XG I SAP Research Center Dresden

More information

Natural Language Processing. SoSe Question Answering

Natural Language Processing. SoSe Question Answering Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation

More information

An Interactive e-government Question Answering System

An Interactive e-government Question Answering System An Interactive e-government Question Answering System Malte Schwarzer 1, Jonas Düver 1, Danuta Ploch 2, and Andreas Lommatzsch 2 1 Technische Universität Berli, Straße des 17. Juni, D-10625 Berlin, Germany

More information

Automatic Document Quality Control

Automatic Document Quality Control Automatic Document Quality Control Neil Newbold and Lee Gillam University of Surrey, Surrey GU2 7XH, UK E-mail: n.newbold@surrey.ac.uk, l.gillam@surrey.ac.uk Abstract This paper focuses on automatically

More information

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014 NLP Chain Giuseppe Castellucci castellucci@ing.uniroma2.it Web Mining & Retrieval a.a. 2013/2014 Outline NLP chains RevNLT Exercise NLP chain Automatic analysis of texts At different levels Token Morphological

More information

Testbed a walk-through

Testbed a walk-through Testbed a walk-through Digital Preservation Planning: Principles, Examples and the Future with Planets, July 2008 Matthew Barr HATII at the University of Glasgow Contents Definitions and goals Achievements

More information

Dmesure: a readability platform for French as a foreign language

Dmesure: a readability platform for French as a foreign language Dmesure: a readability platform for French as a foreign language Thomas François 1, 2 and Hubert Naets 2 (1) Aspirant F.N.R.S. (2) CENTAL, Université Catholique de Louvain Presentation at CLIN 21 February

More information

Experiences with UIMA in NLP teaching and research. Manuela Kunze, Dietmar Rösner

Experiences with UIMA in NLP teaching and research. Manuela Kunze, Dietmar Rösner Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing Overview What is UIMA? First Experiments NLP Teaching

More information

EXHIBIT A: PROJECT SCOPE AND SCHEDULE The Project consists of the activities and delivery dates identified in this Exhibit A, implemented in accordance with the Agreement. The Parties have included a schedule

More information

Fast and Effective System for Name Entity Recognition on Big Data

Fast and Effective System for Name Entity Recognition on Big Data International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam

More information

UIMA-based Annotation Type System for a Text Mining Architecture

UIMA-based Annotation Type System for a Text Mining Architecture UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and

More information

Natural Language Interfaces to Ontologies. Danica Damljanović

Natural Language Interfaces to Ontologies. Danica Damljanović Natural Language Interfaces to Ontologies Danica Damljanović danica@dcs.shef.ac.uk Sponsored by Transitioning Applications to Ontologies: www.tao-project.eu GATE case study in TAO project collect software

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Thomas Reschenhofer Ivan Monahov Florian Matthes

Thomas Reschenhofer Ivan Monahov Florian Matthes Application of a Domain-Specific Language to Support the User-Oriented Definition of Visualizations in the Context of Collaborative Product Development Thomas Reschenhofer Ivan Monahov Florian Matthes

More information

WebAnno: a flexible, web-based annotation tool for CLARIN

WebAnno: a flexible, web-based annotation tool for CLARIN WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike

More information

Comparing human versus automatic feature extraction for fine-grained elementary readability assessment

Comparing human versus automatic feature extraction for fine-grained elementary readability assessment Comparing human versus automatic feature extraction for fine-grained elementary readability assessment Yi Ma, Ritu Singh, Eric Fosler-Lussier Robert Lofthus Dept. of Computer Science & Engineering Xerox

More information

Deliverable D Adapted tools for the QTLaunchPad infrastructure

Deliverable D Adapted tools for the QTLaunchPad infrastructure This document is part of the Coordination and Support Action Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad). This project has received funding from the

More information

Using NLP and context for improved search result in specialized search engines

Using NLP and context for improved search result in specialized search engines Mälardalen University School of Innovation Design and Engineering Västerås, Sweden Thesis for the Degree of Bachelor of Science in Computer Science DVA331 Using NLP and context for improved search result

More information

Deliverable D1.4 Report Describing Integration Strategies and Experiments

Deliverable D1.4 Report Describing Integration Strategies and Experiments DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004 Report Describing

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

Towards a roadmap for standardization in language technology

Towards a roadmap for standardization in language technology Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA Vassar College Overview General background on standardization Available standards On-going activities

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012 A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of

More information

Stakeholder consultation process and online consultation platform

Stakeholder consultation process and online consultation platform Stakeholder consultation process and online consultation platform Grant agreement no.: 633107 Deliverable No. D6.2 Stakeholder consultation process and online consultation platform Status: Final Dissemination

More information

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart Design and Realization of the EXCITEMENT Open Platform for Textual Entailment Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart Textual Entailment Textual Entailment (TE) A Text (T) entails a

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

Knowledge-based pattern recognition and visualization of error logs of time-based engine sensor data: Requirements engineering and tool-support

Knowledge-based pattern recognition and visualization of error logs of time-based engine sensor data: Requirements engineering and tool-support Knowledge-based pattern recognition and visualization of error logs of time-based engine sensor data: Requirements engineering and tool-support Viet Tiep Do, 09 February 2015 Software Engineering for Business

More information

SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches

SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY Parser Evaluation Approaches NATURE OF PARSER EVALUATION Return accurate syntactic structure of sentence. Which representation? Robustness of parsing. Quick

More information

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center

More information

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring

More information

D4.6 Data Value Chain Database v2

D4.6 Data Value Chain Database v2 D4.6 Data Value Chain Database v2 Coordinator: Fabrizio Orlandi (Fraunhofer) With contributions from: Isaiah Mulang Onando (Fraunhofer), Luis-Daniel Ibáñez (SOTON) Reviewer: Ryan Goodman (ODI) Deliverable

More information

TectoMT: Modular NLP Framework

TectoMT: Modular NLP Framework : Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation

More information

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review: Following are three examples of calculations for MCP employees (undefined hours of work) and three examples for MCP office employees. Examples use the data from the table below. For your calculations use

More information

Research partnerships, user participation, extended outreach some of ETH Library s steps beyond digitization

Research partnerships, user participation, extended outreach some of ETH Library s steps beyond digitization IFLA Satellite Meeting 2017: Digital Humanities, Berlin, 15 17 August 2017 Research partnerships, user participation, extended outreach some of ETH Library s steps beyond digitization From «boutique» to

More information

Advanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation. ATIR April 28, 2016

Advanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation. ATIR April 28, 2016 Advanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR April 28, 2016 Organizational

More information

A Multilingual Social Media Linguistic Corpus

A Multilingual Social Media Linguistic Corpus A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

hereby recognizes that Timotej Verbovsek has successfully completed the web course 3D Analysis of Surfaces and Features Using ArcGIS 10

hereby recognizes that Timotej Verbovsek has successfully completed the web course 3D Analysis of Surfaces and Features Using ArcGIS 10 3D Analysis of Surfaces and Features Using ArcGIS 10 Completed on September 5, 2012 3D Visualization Techniques Using ArcGIS 10 Completed on November 19, 2011 Basics of Map Projections (for ArcGIS 10)

More information

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered. Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything

More information

English Understanding: From Annotations to AMRs

English Understanding: From Annotations to AMRs English Understanding: From Annotations to AMRs Nathan Schneider August 28, 2012 :: ISI NLP Group :: Summer Internship Project Presentation 1 Current state of the art: syntax-based MT Hierarchical/syntactic

More information

Sustainability of Text-Technological Resources

Sustainability of Text-Technological Resources Sustainability of Text-Technological Resources Maik Stührenberg, Michael Beißwenger, Kai-Uwe Kühnberger, Harald Lüngen, Alexander Mehler, Dieter Metzing, Uwe Mönnich Research Group Text-Technological Overview

More information

Raymond P.L. Buse and Westley R. Weimer. Presenter: Rag Mayur Chevuri

Raymond P.L. Buse and Westley R. Weimer. Presenter: Rag Mayur Chevuri Raymond P.L. Buse and Westley R. Weimer {buse,weimer}@cs.virginia.edu Presenter: Rag Mayur Chevuri Maintenance consumes 70% of total life cycle cost Reading code Most time-consuming component of maintenance

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Frank Richter fr@sfs.uni-tuebingen.de. Seminar für Sprachwissenschaft Eberhard Karls Universität Tübingen Germany Intro to CL WS 2011/12 p.1 Langenscheidt s T1

More information

Using UIMA to Structure an Open Platform for Textual Entailment. Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University

Using UIMA to Structure an Open Platform for Textual Entailment. Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University Using UIMA to Structure an Open Platform for Textual Entailment Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University The paper is about About EXCITEMENT Open Platform a

More information

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com

More information

A tool for Cross-Language Pair Annotations: CLPA

A tool for Cross-Language Pair Annotations: CLPA A tool for Cross-Language Pair Annotations: CLPA August 28, 2006 This document describes our tool called Cross-Language Pair Annotator (CLPA) that is capable to automatically annotate cognates and false

More information

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ??? @ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON

More information

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:

More information

stanford hci group / cs376 Design Tools Ron B. Yeh 26 October 2004 Research Topics in Human-Computer Interaction

stanford hci group / cs376 Design Tools Ron B. Yeh 26 October 2004 Research Topics in Human-Computer Interaction stanford hci group / cs376 Design Tools Ron B. Yeh 26 October 2004 Research Topics in Human-Computer Interaction http://cs376.stanford.edu Reading Research Papers Selective Attention =) Or alternatively,

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

Package corenlp. June 3, 2015

Package corenlp. June 3, 2015 Type Package Title Wrappers Around Stanford CoreNLP Tools Version 0.4-1 Author Taylor Arnold, Lauren Tilton Package corenlp June 3, 2015 Maintainer Taylor Arnold Provides a minimal

More information

It is recommended that you submit this work no later than Tuesday, 12 October Solution examples will be presented on 13 October.

It is recommended that you submit this work no later than Tuesday, 12 October Solution examples will be presented on 13 October. ICT/KTH 08-Oct-2010/FK (updated) id1006 Java Programming Assignment 3 - Flesch metric and commandline parsing It is recommended that you submit this work no later than Tuesday, 12 October 2010. Solution

More information

IB Event Calendar Please check regularly for updates Last Update: April 30, 2013

IB Event Calendar Please check regularly for updates Last Update: April 30, 2013 IB Event Calendar 2012-2013 Please check regularly for updates Last Update: April 30, 2013 April 2013 24: (Sophomores): Required orientation meeting for all sophomores entering into IB their junior year.

More information

Help! My Birthday Reminder Wants to Brick My Phone!

Help! My Birthday Reminder Wants to Brick My Phone! Institut für Technische Informatik und Kommunikationsnetze Master Thesis Help! My Birthday Reminder Wants to Brick My Phone! Student Name Advisor: Dr. Stephan Neuhaus, neuhaust@tik.ee.ethz.ch Professor:

More information

Draft ETSI EG V3.1.1 ( )

Draft ETSI EG V3.1.1 ( ) Draft EG 200 351 V3.1.1 (1999-07) Guide object identifier tree; Rules and registration procedures 2 Draft EG 200 351 V3.1.1 (1999-07) Reference REG/SPS-05209 (39001icq.PDF) Keywords object identifier,

More information

VTC FY19 CO-OP GOOGLE QUALIFICATIONS PARAMETERS & REIMBURSEMENT DOCUMENTATION HOW-TO

VTC FY19 CO-OP GOOGLE QUALIFICATIONS PARAMETERS & REIMBURSEMENT DOCUMENTATION HOW-TO VTC FY19 CO-OP GOOGLE QUALIFICATIONS PARAMETERS & REIMBURSEMENT DOCUMENTATION HOW-TO 1 TABLE OF CONTENTS 01 CO-OP QUALIFICATIONS 02 REQUIRED DOCUMENTATION 03 REPORTING HOW-TO 04 REIMBURSEMENT PROCESS 2

More information

Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki

Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki Overview What is UIMA? A framework for NLP tasks and tools Part-of-Speech Tagging Full Parsing Shallow Parsing

More information

Cate: A System for Analysis and Test of Java Card Applications

Cate: A System for Analysis and Test of Java Card Applications Cate: A System for Analysis and Test of Java Card Applications Peter Pfahler and Jürgen Günther Email:peter@uni-paderborn.de jguenther@orga.com Universität Paderborn, Department of Computer Science, D-33098

More information

CPA PEP 2018 Schedule and Fees

CPA PEP 2018 Schedule and Fees CPA PEP Schedule and Fees The CPA Professional Education Program (CPA PEP) is a graduatelevel program. CPA PEP comprises a series of modules that focus primarily on enhancing CPA candidates ability to

More information

Natural Language Processing Is No Free Lunch

Natural Language Processing Is No Free Lunch Natural Language Processing Is No Free Lunch STEFAN WAGNER UNIVERSITY OF STUTTGART, STUTTGART, GERMANY ntroduction o Impressive progress in NLP: OS with personal assistants like Siri or Cortan o Brief

More information

STS Infrastructural considerations. Christian Chiarcos

STS Infrastructural considerations. Christian Chiarcos STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)

More information

Lab II - Product Specification Outline. CS 411W Lab II. Prototype Product Specification For CLASH. Professor Janet Brunelle Professor Hill Price

Lab II - Product Specification Outline. CS 411W Lab II. Prototype Product Specification For CLASH. Professor Janet Brunelle Professor Hill Price Lab II - Product Specification Outline CS 411W Lab II Prototype Product Specification For CLASH Professor Janet Brunelle Professor Hill Price Prepared by: Artem Fisan Date: 04/20/2015 Table of Contents

More information

Natural Language Processing Tutorial May 26 & 27, 2011

Natural Language Processing Tutorial May 26 & 27, 2011 Cognitive Computation Group Natural Language Processing Tutorial May 26 & 27, 2011 http://cogcomp.cs.illinois.edu So why aren t words enough? Depends on the application more advanced task may require more

More information