Annotating Spatio-Temporal Information in Documents
|
|
- Julianna Hutchinson
- 5 years ago
- Views:
Transcription
1 Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group June 8, 2010 Name Classification and Grounding in Multilingual Corpora University of Zurich
2 Motivation Information Extraction Model Pipeline HeidelTime Summary University of Heidelberg Oldest German university founded in 1386 Volluniversität 12 faculties, 180 fields of study students (20% international students) Computer Science Computational Linguistics June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 2 / 60
3 Database Systems Research Group Major research topics include: Geospatial and spatio-temporal data management Moving objects and object trajectories Processing and mining geospatial data streams Spatial and temporal information extraction Spatial and temporal information retrieval June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 3 / 60
4 Motivation A lot of information is only published in unstructured format text Information extraction helps to identify valuable information Names Locations Dates This information is useful for several search and exploration tasks June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 4 / 60
5 Motivation Query to Google: Alexander von Humboldt more than 1 Million results a lot of unstructured information need for help for document search and exploration June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 5 / 60
6 Motivation Figure: Part of Wikipedia Page Alexander von Humboldt June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 6 / 60
7 Motivation Information Extraction Model Pipeline HeidelTime Summary Motivation What is the document talking about? Events = space + time Figure: Part of Wikipedia Page Alexander von Humboldt June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 7 / 60
8 Motivation Information Extraction Model Pipeline HeidelTime Summary Motivation Figure: Part of Wikipedia Page Alexander von Humboldt June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 8 / 60
9 Motivation Information Extraction Model Pipeline HeidelTime Summary Motivation Figure: Part of Wikipedia Page Alexander von Humboldt June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 9 / 60
10 Motivation Goal Extraction and Exploration of Spatio-Temporal Information in Documents ( extraction of events) Tasks information extraction (temporal and spatial) a model for spatio-temporal information (events) implementation: document processing pipeline June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 10 / 60
11 Outline 1 Information Extraction Temporal Information Extraction Spatial Information Extraction 2 A model for spatio-temporal information Spatio-Temporal Document Profiles 3 Document Processing Pipeline Yahoo Placemaker 4 The Temporal Tagger HeidelTime 5 Summary and Ongoing Work June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 11 / 60
12 Information Extraction Information Extraction a lot of information only published in unstructured format Temporal information and spatial information in documents widely spread most valuable for search and exploration tasks Temporal and spatial information extraction Named Entity Recognition and Normalization tasks June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 13 / 60
13 Temporal Information Extraction Temporal information Timex3 explicit: October 12, implicit: Columbus Day relative: today Extraction identify temporal expressions with offset information Normalization (to Timex3 ISO standard) all expressions are normalized to their standard format all expressions referring to the same value have identical standard format value June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 14 / 60
14 Spatial Information Extraction Spatial Information highly ambiguous (Go to Springfield in the US) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 15 / 60
15 Spatial Information Extraction Springfields in the United States June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 16 / 60
16 Spatial Information Extraction Spatial Information highly ambiguous associated with longitude/latitude information associated with a geometry (point or polygonal region) Extraction identify spatial expression with offset information Normalization all expressions get their longitude/latitude information all expressions referring to the same location have identical longitude/latitude information (e.g., New York City, NYC, Big Apple ) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 17 / 60
17 A model for spatio-temporal information Document profiles: a model describing a document s information in a concise manner a data structure to make spatial and temporal information accessible for search and exploration tasks temporal document profiles spatial document profiles spatio-temporal document profiles June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 19 / 60
18 Temporal Document Profiles A temporal document profile tdp(d) is a sequence of tuples e i, c i, p i e i temporal expression c i normalized value (chronon) p i offset information in the document Example tdp(d) = {..., January 6, 1802, , ,... } All tuples extracted by the temporal tagger normalized to their standard format June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 20 / 60
19 Spatial Document Profiles A spatial document profile sdp(d) is a sequence of tuples g i, v i, p i g i geographic expression v i normalized value (longitute/latitude) p i offset information in the document Example sdp(d) = {..., Quito, -78.5/-0.19, ,... } All tuples extracted by the geo tagger normalized to their standard format June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 21 / 60
20 Spatio-Temporal Document Profiles Question: How to combine spatial and temporal information to extract events? Method: Extraction of co-occurrences of spatial and temporal information June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 22 / 60
21 Spatio-Temporal Document Profiles Co-occurrence both expressions occur in the same window of the document (e.g., paragraph or sentence) A spatio-temporal document profile stdp(d) combines the spatial and temporal information is a sequence of tuples e, c, g, v, p t, p s e, c, p t is in tdp(d) g, v, p s is in sdp(d) p t and p s belong to the same window of the document June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 23 / 60
22 Spatio-Temporal Document Profiles Example: Entities with normalization: te 1 November 24, , se 1 Cuba, -79.5/22.0, se 2 Cartagena, Columbia, -75.5/10.4, Cooccurrences: te 1 se 1 te 1 se 2 June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 24 / 60
23 Spatio-Temporal Document Profiles Example: Entities with normalization: te 2 January 6, 1802, , se 3 Magdalena, -74.5/10.0, , se 4 Cordillera Real, -78.0/0.0, se 5 Quito, -78.5/-0.19, Cooccurrence: te 2 se 3 te 2 se 4 te 2 se 5 June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 25 / 60
24 Spatio-Temporal Document Profiles stdp(d) = {..., e 1, , Cuba, -79.5/22.0, p t, p s, e 1, , Cartagena, Columbia, -75.5/10.4, p t, p s, e 2, , Magdalena, -74.5/10.0, p t, p s, e 2, , Cordillera Real, -78.0/0.0, p t, p s, e 2, , Quito, -78.5/-0.19, p t, p s,...,} June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 26 / 60
25 Document Trajectory stdp(d) sequence of tuples ordered by time a good model hard to analyze not eye-catching document trajectory a trajectory is a sequence of time/location pairs stdp(d) can be seen as a document trajectory sequence of events document trajectories can be visualized on a map June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 27 / 60
26 Document Trajectory Figure: Part of the Document Trajectory of Wikipedia s Humboldt page June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 28 / 60
27 Document Trajectory Useful for search and exploration tasks: visualization of the document s events on a map one document multiple documents spatio-temporal snippets June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 29 / 60
28 Document Processing Pipeline Goals: flexible pipeline corpus independent processing pipeline ability to integrate new components easily June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 31 / 60
29 Document Processing Pipeline UIMA: Unstructured Information Management Architecture component framework for unstructured content helps to connect tools not built to be used together: all components work on the same data structure the CAS object Common Analysis Structure June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 32 / 60
30 UIMA - Components of a Pipeline Docs Collection Reader Analysis Engines CAS Consumer Results June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 33 / 60
31 UIMA - Components of a Pipeline Docs Collection Reader Analysis Engines CAS Consumer Results CAS doc text metadata Collection Reader reads documents from source (e.g file system, database) instantiates a CAS for each document initializes CAS with doc text (metadata, etc.) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 33 / 60
32 UIMA - Components of a Pipeline Docs Collection Reader Analysis Analysis Analysis Analysis Engines Engines Engines Engines CAS Consumer Results Analysis Engines CAS doc text metadata annotations usually several Analysis Engines analyze the document read content of the CAS add annotations to the CAS June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 33 / 60
33 UIMA - Components of a Pipeline Docs Collection Reader Analysis Engines CAS Consumer Results CAS Consumer reads content of the CAS does final processing evaluation, visualization, indexing CAS doc text metadata annotations June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 33 / 60
34 UIMA - Components of a Pipeline Docs Collection Reader Analysis Engines CAS Consumer Results CAS UIMA - What s the clue? single components are not directly connected to each other instead: use of CAS components are independent of each other components only have to be able to handle CAS June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 33 / 60
35 Document Processing Pipeline Sources Tasks Results Wikipedia Featured Articles Goldstandard Paragraph Splitting Sentence Splitting Geo Tagging Temporal Tagging Co occurrence Extraction Document Profiles Evaluation Results Document Trajectories Store results in a Database June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 34 / 60
36 Document Processing Pipeline Sources Tasks Results Collection Reader Wiki Reader Analysis Engines Paragraph Splitter Analysis Engines Sentence Splitter CAS Consumer Database Writer Gold Standard Reader Geo Tagger Temporal Tagger Co occurrence Extractor Visualizor Evaluator June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 34 / 60
37 Document Processing Pipeline Components Sentence Splitter OpenNLP Sentence Splitter Geo Tagger MetaCarta Service Yahoo Placemaker Temporal Tagger own implementation June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 35 / 60
38 Yahoo Placemaker What is Yahoo Placemaker? free geo-parsing web service returns geographic metadata Processing steps of Yahoo Placemaker identifies places in unstructured content disambiguates those places returns unique identifiers (WOEIDs) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 36 / 60
39 Yahoo Placemaker Supported languages: multiple languages e.g., English, German, Italian, French, Spanish, Japanese, Chinese,... Information on identified places: latitude/longitude information normalized name June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 37 / 60
40 Yahoo Placemaker Additional information using Yahoo GeoPlanet API: bounding box containment information e.g.: World Trade Center Downtown Manhatten New York New York (State) United States Earth... June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 38 / 60
41 The Temporal Tagger HeidelTime HeidelTime: a rule-based system for the extraction of temporal expressions their normalization (according to Timex3 standard) Optimized for TempEval-2 challenge Evaluated within TempEval-2 challenge June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 40 / 60
42 The Temporal Tagger HeidelTime The TempEval-2 challenge Task 13 of SemEval th Workshop on Semantic Evaluation 6 tasks: Extraction and normalization of temporal expressions (Task A) events (Task B) temporal relations (Task C-F) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 41 / 60
43 Temporal Expressions 4 types of semantics: Dates April 29, 2010 Times 12 p.m. Durations two weeks Sets twice a week June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 42 / 60
44 Temporal Expressions 3 types of occurrences: explicit: October 12, implicit: Columbus Day relative: today June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 43 / 60
45 HeidelTime Extraction: mainly regular expressions other features (POS, POS of next token, etc.) Normalization: knowledge resources (names of months, holidays, etc.) linguistic clues (tense of sentences) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 44 / 60
46 HeidelTime Rules: Every rule is a triple: expression rule normalization function type information Example of a temporal expression: June 8, 2010 June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 45 / 60
47 HeidelTime Expression rule (of type date): date_r1 = (remonth) g1 (reday) g2, (refullyear) g3 Normalization function: norm_r1(g1, g2, g3) = g3 normmonth(g1) normday(g2) Expression Resources: remonth = (... June July... ) reseason = (... summer... ) Normalization functions: normmonth( June ) = 06 normmonth( July ) = 07 normseason( summer ) = SU Normalized temporal expression June 8, June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 46 / 60
48 HeidelTime: Architecture Realized as UIMA component Rule development within UIMA pipeline June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 47 / 60
49 HeidelTime: Architecture UIMA Document Processing Pipeline TempEval 2 data TempEval 2 File Writer TempEval 2 Reader Collection Readers CAS Consumers rule design workflow other heterogeneous sources Sentence Splitter Tokenizer POS Tagger HeidelTime Analysis Engines TempEval 2 Evaluator other Collection Readers other Analysis Engines other Consumers task workflow Rule development: TempEval-2 data: training data (goldstandard) TempEval-2 Evaluator: lists of fp, fn, tp Evaluation TempEval-2 data: test data TempEval-2 File Writer creates files to submit June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 48 / 60
50 HeidelTime: Evaluation TempEval-2 9 systems for Task A (15 runs) HeidelTime 2 runs precision optimized rule set recall optimized rule set June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 49 / 60
51 HeidelTime: Evaluation Extraction: 100 Recall [%] Precision-optimized: P R F-score 90 % 82 % 86 % Precision [%] Recall-optimized: P R F-score 82 % 91 % 86 % Figure: Performance of participating systems with F-Score contour for reference. June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 50 / 60
52 HeidelTime: Evaluation Normalization: 100 other systems HeidelTime Value Normalization [%] HT-1 HT-2 s-1 s-2 s-3 s-4 s-5 s-6 s-7 s-8 s-9 s-10 s-11 s-12 s-13 System Correct value (normalized value): precision-optimized 85 % recall-optimized 77 % Correct type (date, time,... ): precision-optimized 96 % recall-optimized 92 % Figure: Value normalization results of participating systems. June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 51 / 60
53 HeidelTime: Evaluation Evaluation results: HeidelTime: best system for extraction task HeidelTime: best system for normalization task Differences to other systems: SemEval workshop in July (at ACL conference) June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 52 / 60
54 HeidelTime: Goals Adaptations for other languages: new extraction resources (names of months, days,... ) new normalization functions for those expressions new rules Adaptations for other types of documents TempEval: news documents other documents Normalization more difficult document creation time less useful for normalization June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 53 / 60
55 Summary Model and Implementation extraction of events (space & time) a way to organize temporal and spatial information spatio-temporal document profiles document trajectories Search and Exploration tasks: visualization of events exploration of spatio-temporal snippets similarity search using stdp query constraints using stdp June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 55 / 60
56 Summary Geo Tagging several Geo Tagger available quility depends on: used gazetteer for coverage used (NLP) methods for disambiguation Temporal Tagging few tools available HeidelTime achieves good results for English June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 56 / 60
57 Ongoing Work Temporal Tagger adapt HeidelTime to other languages and corpora clean-up code to make HeidelTime available Improve Model: cooccurrence approach ignores context instead of cooccurrences use of NLP methods better understanding of syntax and semantics new NLP components as new analysis engines Which date belongs to which location? In 1792 and 1797 he was in Vienna, in 1795 he made a geological and botanical tour through Switzerland and Italy. June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 57 / 60
58 Ongoing Work Evaluation compare different NER tools for locations evaluate the quality of document trajectories Enlarge the model: add Who or What to Where and When! June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 58 / 60
59 Further Reading Spatio-temporal Information: Jannik Strötgen, Michael Gertz, and Pavel Popov. Extraction and Exploration of Spatio-Temporal Information in Documents. In: GIR 10: Proceedings of the 6th Workshop On Geographic Information Retrieval, Zurich, Switzerland, February 18-19, ACM. Temporal Tagger HeidelTime: Jannik Strötgen and Michael Gertz. HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions. To appear in: SemEval-2010: 5th International Workshop on Semantic Evaluations (at ACL 2010), ACL. June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 59 / 60
60 Thank you for your attention! June 8, 2010 Annotating Spatio-Temporal Information Jannik Strötgen 60 / 60
Temporal Information Extraction using Regular Expressions
Temporal Information Extraction using Regular Expressions Anton Fagerberg D10, Lund Institute of Technology, Sweden anton@antonfagerberg.com ada10afa@student.lu.se 2014-01-13 Abstract This is a description
More informationAn UIMA based Tool Suite for Semantic Text Processing
An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life
More informationSUTIME: Evaluation in TempEval-3
SUTIME: Evaluation in TempEval-3 Angel X. Chang Stanford University angelx@cs.stanford.edu Christopher D. Manning Stanford University manning@cs.stanford.edu Abstract We analyze the performance of SUTIME,
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationTime Expression Analysis and Recognition Using Syntactic Token Types and General Heuristic Rules
Time Expression Analysis and Recognition Using Syntactic Token Types and General Heuristic Rules Xiaoshi Zhong, Aixin Sun, and Erik Cambria Computer Science and Engineering Nanyang Technological University
More informationUniversity of Alicante at NTCIR-9 GeoTime
University of Alicante at NTCIR-9 GeoTime Fernando S. Peregrino fsperegrino@dlsi.ua.es David Tomás dtomas@dlsi.ua.es Department of Software and Computing Systems University of Alicante Carretera San Vicente
More informationHandling Place References in Text
Handling Place References in Text Introduction Most (geographic) information is available in the form of textual documents Place reference resolution involves two-subtasks: Recognition : Delimiting occurrences
More informationTempWeb rd Temporal Web Analytics Workshop
TempWeb 2013 3 rd Temporal Web Analytics Workshop Stuff happens continuously: exploring Web contents with temporal information Omar Alonso Microsoft 13 May 2013 Disclaimer The views, opinions, positions,
More informationAutomatic Extraction of Time Expressions and Representation of Temporal Constraints
Automatic Extraction of Time Expressions and Representation of Temporal Constraints N-GSLT: Natural Language Processing Term Paper Margus Treumuth Institute of Computer Science University of Tartu, Tartu,
More informationThe Wikipedia XML Corpus
INEX REPORT The Wikipedia XML Corpus Ludovic Denoyer, Patrick Gallinari Laboratoire d Informatique de Paris 6 8 rue du capitaine Scott 75015 Paris http://www-connex.lip6.fr/denoyer/wikipediaxml {ludovic.denoyer,
More informationEntity-centric Topic Extraction and Exploration: A Network-based Approach
Entity-centric Topic Extraction and Exploration: A Network-based Approach Andreas Spitz and Michael Gertz March 27, 2018 ECIR 2018, Grenoble Heidelberg University, Germany Database Systems Research Group
More informationTemponym Tagging: Temporal Scopes for Textual Phrases
Temponym Tagging: Temporal Scopes for Textual Phrases Erdal Kuzey, Jannik Strötgen, Vinay Setty, Gerhard Weikum jannik.stroetgen@mpi-inf.mpg.de TempWeb Montréal, April 12, 2016 Why temporal information
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationTRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store
TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store Roldano Cattoni 1, Francesco Corcoglioniti 1,2, Christian Girardi 1, Bernardo Magnini 1, Luciano Serafini
More informationFast and Effective System for Name Entity Recognition on Big Data
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam
More informationExperiences with UIMA in NLP teaching and research. Manuela Kunze, Dietmar Rösner
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing Overview What is UIMA? First Experiments NLP Teaching
More informationEntity Linking at TAC Task Description
Entity Linking at TAC 2013 Task Description Version 1.0 of April 9, 2013 1 Introduction The main goal of the Knowledge Base Population (KBP) track at TAC 2013 is to promote research in and to evaluate
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationBuilding the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format
Building the Multilingual Web of Data Integrating NLP with Linked Data and RDF using the NLP Interchange Format Presenter name 1 Outline 1. Introduction 2. NIF Basics 3. NIF corpora 4. NIF tools & services
More informationThe Multilingual Language Library
The Multilingual Language Library @ LREC 2012 Let s build it together! Nicoletta Calzolari with Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale
More informationIterative Learning of Relation Patterns for Market Analysis with UIMA
UIMA Workshop, GLDV, Tübingen, 09.04.2007 Iterative Learning of Relation Patterns for Market Analysis with UIMA Sebastian Blohm, Jürgen Umbrich, Philipp Cimiano, York Sure Universität Karlsruhe (TH), Institut
More informationAn Adaptive Framework for Named Entity Combination
An Adaptive Framework for Named Entity Combination Bogdan Sacaleanu 1, Günter Neumann 2 1 IMC AG, 2 DFKI GmbH 1 New Business Department, 2 Language Technology Department Saarbrücken, Germany E-mail: Bogdan.Sacaleanu@im-c.de,
More informationEnhanced retrieval using semantic technologies:
Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008
More informationAdvanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation. ATIR April 28, 2016
Advanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR April 28, 2016 Organizational
More informationPRIS at TAC2012 KBP Track
PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and
More informationTime-Surfer: Time-Based Graphical Access to Document Content
Time-Surfer: Time-Based Graphical Access to Document Content Hector Llorens 1,EstelaSaquete 1,BorjaNavarro 1,andRobertGaizauskas 2 1 University of Alicante, Spain {hllorens,stela,borja}@dlsi.ua.es 2 University
More informationLIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases
LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationState of the Art and Trends in Search Engine Technology. Gerhard Weikum
State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is
More informationTowards Summarizing the Web of Entities
Towards Summarizing the Web of Entities contributors: August 15, 2012 Thomas Hofmann Director of Engineering Search Ads Quality Zurich, Google Switzerland thofmann@google.com Enrique Alfonseca Yasemin
More informationWikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Heather Simpson 1, Stephanie Strassel 1, Robert Parker 1, Paul McNamee
More informationPrecise Medication Extraction using Agile Text Mining
Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationManning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques
Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:
More informationHeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML s Empty Tags
HeidelTime at EVENTI: Tuning Italian Resources and Addressing TimeML s Empty Tags Giulio Manfredi and Jannik Strötgen and Julian Zell and Michael Gertz Institute of Computer Science, Heidelberg University,
More informationWikulu: Information Management in Wikis Enhanced by Language Technologies
Wikulu: Information Management in Wikis Enhanced by Language Technologies Iryna Gurevych (this is joint work with Dr. Torsten Zesch, Daniel Bär and Nico Erbs) 1 UKP Lab: Projects UKP Lab Educational Natural
More informationA fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP
A fully-automatic approach to answer geographic queries: at GikiP Johannes Leveling Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen)
More informationNatural Language Processing with PoolParty
Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense
More informationSemantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.
Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...
More informationUsing UIMA to Structure an Open Platform for Textual Entailment. Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University
Using UIMA to Structure an Open Platform for Textual Entailment Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University The paper is about About EXCITEMENT Open Platform a
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationA cocktail approach to the VideoCLEF 09 linking task
A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,
More information3 Publishing Technique
Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationRefining Imprecise Spatio-temporal Events: A Network-based Approach
Refining Imprecise Spatio-temporal Events: A Network-based Approach Andreas Spitz Institute of Computer Science Heidelberg University spitz@informatik.uniheidelberg.de Johanna Geiß Institute of Computer
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationA Multilingual Social Media Linguistic Corpus
A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th
More informationExtending the Facets concept by applying NLP tools to catalog records of scientific literature
Extending the Facets concept by applying NLP tools to catalog records of scientific literature *E. Picchi, *M. Sassi, **S. Biagioni, **S. Giannini *Institute of Computational Linguistics **Institute of
More informationMining the Web 2.0 to improve Search
Mining the Web 2.0 to improve Search Ricardo Baeza-Yates VP, Yahoo! Research Agenda The Power of Data Examples Improving Image Search (Faceted Clusters) Searching the Wikipedia (Correlator) Understanding
More informationImplementing a Variety of Linguistic Annotations
Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing
More informationIt s time for a semantic engine!
It s time for a semantic engine! Ido Dagan Bar-Ilan University, Israel 1 Semantic Knowledge is not the goal it s a primary mean to achieve semantic inference! Knowledge design should be derived from its
More informationPERIODIC REPORT 3 KYOTO, ICT version April 2012
PERIODIC REPORT 3 KYOTO, ICT 211423 version 5 26 April 2012 Editor: Prof. Dr. Piek Th.J.M. Vossen, VUA, p.vossen@let.vu.nl Knowledge Yielding Ontologies for Transition-based Organization ICT 211423 1/10
More informationTextual Emigration Analysis
Textual Emigration Analysis Andre Blessing and Jonas Kuhn IMS - Universität Stuttgart, Germany clarin@ims.uni-stuttgart.de Abstract We present a web-based application which is called TEA (Textual Emigration
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationDeliverable D1.4 Report Describing Integration Strategies and Experiments
DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004 Report Describing
More informationLangforia: Language Pipelines for Annotating Large Collections of Documents
Langforia: Language Pipelines for Annotating Large Collections of Documents Marcus Klang Lund University Department of Computer Science Lund, Sweden Marcus.Klang@cs.lth.se Pierre Nugues Lund University
More informationIntroduction
Introduction EuropeanaConnect All-Staff Meeting Berlin, May 10 12, 2010 Welcome to the All-Staff Meeting! Introduction This is a quite big meeting. This is the end of successful project year Project established
More informationKAF: a generic semantic annotation format
KAF: a generic semantic annotation format Wauter Bosma & Piek Vossen (VU University Amsterdam) Aitor Soroa & German Rigau (Basque Country University) Maurizio Tesconi & Andrea Marchetti (CNR-IIT, Pisa)
More informationD4.6 Data Value Chain Database v2
D4.6 Data Value Chain Database v2 Coordinator: Fabrizio Orlandi (Fraunhofer) With contributions from: Isaiah Mulang Onando (Fraunhofer), Luis-Daniel Ibáñez (SOTON) Reviewer: Ryan Goodman (ODI) Deliverable
More informationDesign and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart
Design and Realization of the EXCITEMENT Open Platform for Textual Entailment Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart Textual Entailment Textual Entailment (TE) A Text (T) entails a
More informationACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE
June 30, 2012 San Diego Convention Center ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE Stuart Laurie, Senior Consultant #SPSSAN Agenda 1. Challenges 2. What comes out of the box
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationProject GRACE: A grid based search tool for the global digital library
Project GRACE: A grid based search tool for the global digital library Frank Scholze 1, Glenn Haya 2, Jens Vigen 3, Petra Prazak 4 1 Stuttgart University Library, Postfach 10 49 41, 70043 Stuttgart, Germany;
More informationBabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationPopulating the Semantic Web with Historical Text
Populating the Semantic Web with Historical Text Kate Byrne, ICCS Supervisors: Prof Ewan Klein, Dr Claire Grover 9th December 2008 1 Outline Overview of My Research populating the Semantic Web the Tether
More informationLinking Entities in Chinese Queries to Knowledge Graph
Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn
More informationBetter translations with user collaboration - Integrated MT at Microsoft
Better s with user collaboration - Integrated MT at Microsoft Chris Wendt Microsoft Research One Microsoft Way Redmond, WA 98052 christw@microsoft.com Abstract This paper outlines the methodologies Microsoft
More informationLanguage Resources and Linked Data
Integrating NLP with Linked Data: the NIF Format Milan Dojchinovski @EKAW 2014 November 24-28, 2014, Linkoping, Sweden milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk Web Intelligence Research
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationMPI-INF AT THE NTCIR-11 TEMPORAL QUERY CLASSIFICATION TASK
MPI-INF AT THE NTCIR-11 TEMPORAL QUERY CLASSIFICATION TASK Robin Burghartz Klaus Berberich Max Planck Institute for Informatics, Saarbrücken, Germany General Approach Overall strategy for TQIC subtask:
More informationsend application for a topic until Wednesday, October 25, 1pm
Overview of topics (today) send application for a topic until Wednesday, October 25, 1pm First milestone (mid/end November) prototype/part of software summary of research (literature and related systems/tools)
More informationProceedings of NTCIR-9 Workshop Meeting, December 6-9, 2011, Tokyo, Japan
Overview of the NTCIR-9 Crosslink Task: Cross-lingual Link Discovery Ling-Xiang Tang 1, Shlomo Geva 1, Andrew Trotman 2, Yue Xu 1, Kelly Y. Itakura 1 1 Faculty of Science and Technology, Queensland University
More informationInformation Retrieval
Natural Language Processing SoSe 2014 Information Retrieval Dr. Mariana Neves June 18th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationText, Knowledge, and Information Extraction. Lizhen Qu
Text, Knowledge, and Information Extraction Lizhen Qu A bit about Myself PhD: Databases and Information Systems Group (MPII) Advisors: Prof. Gerhard Weikum and Prof. Rainer Gemulla Thesis: Sentiment Analysis
More informationCSC 5930/9010: Text Mining GATE Developer Overview
1 CSC 5930/9010: Text Mining GATE Developer Overview Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 GATE Components 2 We will deal primarily with GATE Developer:
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationA Short Introduction to CATMA
A Short Introduction to CATMA Outline: I. Getting Started II. Analyzing Texts - Search Queries in CATMA III. Annotating Texts (collaboratively) with CATMA IV. Further Search Queries: Analyze Your Annotations
More informationCross-Lingual Word Sense Disambiguation
Cross-Lingual Word Sense Disambiguation Priyank Jaini Ankit Agrawal pjaini@iitk.ac.in ankitag@iitk.ac.in Department of Mathematics and Statistics Department of Mathematics and Statistics.. Mentor: Prof.
More informationCRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools
CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools Wahed Hemati, Alexander Mehler, and Tolga Uslu Text Technology Lab, Goethe Universitt
More informationBackground and Context for CLASP. Nancy Ide, Vassar College
Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding
More informationAdvanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016
Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full
More informationMSRA Columbus at GeoCLEF 2006
MSRA Columbus at GeoCLEF 2006 Zhisheng Li, Chong Wang 2, Xing Xie 2, Wei-Ying Ma 2 Department of Computer Science, University of Sci. & Tech. of China, Hefei, Anhui, 230026, P.R. China zsli@mail.ustc.edu.cn
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationPig for Natural Language Processing. Max Jakob
Pig for Natural Language Processing Max Jakob Agenda 1 2 3 Introduction (speaker, affiliation, project) Named Entities pignlproc Speaker: Max Jakob MSc in Computational Linguistics Software Developer at
More informationDBpedia Spotlight at the MSM2013 Challenge
DBpedia Spotlight at the MSM2013 Challenge Pablo N. Mendes 1, Dirk Weissenborn 2, and Chris Hokamp 3 1 Kno.e.sis Center, CSE Dept., Wright State University 2 Dept. of Comp. Sci., Dresden Univ. of Tech.
More informationUnderstanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22
Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Junjun Wang 2013/4/22 Outline Introduction Related Word System Overview Subtopic Candidate Mining Subtopic Ranking Results and Discussion
More informationThe Edinburgh Geoparser
The Edinburgh Geoparser A Tool to Geoparse Text Beatrice Alex balex@inf.ed.ac.uk, @bea_alex Projects UK Connectivity DEEP Palimpsest LitLong GAP/GapVis The developers Claire Grover, Richard Tobin, Kate
More informationUnstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki
Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki Overview What is UIMA? A framework for NLP tasks and tools Part-of-Speech Tagging Full Parsing Shallow Parsing
More informationGIR experiements with Forostar at GeoCLEF 2007
GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2
More informationA Textual Entailment System using Web based Machine Translation System
A Textual Entailment System using Web based Machine Translation System Partha Pakray 1, Snehasis Neogi 1, Sivaji Bandyopadhyay 1, Alexander Gelbukh 2 1 Computer Science and Engineering Department, Jadavpur
More informationUIMA-based Annotation Type System for a Text Mining Architecture
UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and
More informationUsing machine learning to predict temporal orientation of search engines queries in the Temporalia challenge
Using machine learning to predict temporal orientation of search engines queries in the Temporalia challenge Michele Filannino, Goran Nenadic filannim@cs.man.ac.uk, g.nenadic@manchester.ac.uk Tokyo, 11/12/2014
More informationSemantic Multimedia Information Retrieval Based on Contextual Descriptions
Semantic Multimedia Information Retrieval Based on Contextual Descriptions Nadine Steinmetz and Harald Sack Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany, nadine.steinmetz@hpi.uni-potsdam.de,
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationA tool for Cross-Language Pair Annotations: CLPA
A tool for Cross-Language Pair Annotations: CLPA August 28, 2006 This document describes our tool called Cross-Language Pair Annotator (CLPA) that is capable to automatically annotate cognates and false
More informationUsing linked data to extract geo-knowledge
Using linked data to extract geo-knowledge Matheus Silva Mota 1, João Sávio Ceregatti Longo 1 Daniel Cintra Cugler 1, Claudia Bauzer Medeiros 1 1 Institute of Computing UNICAMP Campinas, SP Brazil {matheus,joaosavio}@lis.ic.unicamp.br,
More information