Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos

Size: px
Start display at page:

Download "Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos"

Transcription

1 Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos Dilek Küçük 1 and Adnan Yazıcı 2 1 TÜBİTAK -UzayInstitute, Ankara -Turkey dilek.kucuk@uzay.tubitak.gov.tr 2 Dept. of Computer Eng., METU, Ankara - Turkey yazici@ceng.metu.edu.tr

2 Introduction Outline Information Extraction for Semantic Annotation of News Videos Coreferential Chains in Turkish Political News Texts System Overview Evaluation and Discussion Conclusion References 2

3 Introduction [1] The ever-increasing archives of broadcast news videos call for effective ways of querying them. In order to query the video data through high-level semantic entities such as objects, events, and relations, these entities should be properly extracted, and the corresponding video shots should be annotated accordingly. 3

4 Introduction [2] Information extraction (IE) techniques seem promising for object, relation, and event extraction from video texts. In the form of transcription texts obtained through automatic speech recognition (ASR) techniques or closed caption texts. IE is the extraction of useful semantic information such as objects, relations, and events from free natural language texts (Grishman, 2003). 4

5 Introduction [3] An important point to be considered by IE systems is the anaphora phenomenon in natural language texts. Anaphora is the situation where an entity points back to another entity in the text, where the pointing back entity is called an anaphor (Mitkov, 2002). If an anaphor and its antecedent refers to the same real world entity, then they are said to be coreferential hence this situation is called coreference (Mitkov, 2002). 5

6 Introduction [4] In this paper, we present an approach to extract objects for semantic annotation of news videos utilizing lexical resources. Coreferential chains are identified to prevent the extraction of the same entity multiple times with different surface forms. Yet, all surface forms in the chains are preserved for further utilization during prospective semantic query evaluation. 6

7 Information Extraction for Semantic Annotation of News Videos [1] The Fuzzy Conceptual Model for Multimedia Data presented in (Küçük et al., 2008). 7

8 Information Extraction for Semantic Annotation of News Videos [2] The proposed method aids in the automatic annotation of salient objects in video texts as follows: By utilizing a set of lexical resources, salient objects are obtained from the video texts. Similar to the named entity recognition task of IE. Turkish political news texts are selected as the application domain. The salient named entities in the domain is mostly political people. 8

9 Information Extraction for Semantic Annotation of News Videos [3] The extracted objects could be referring to the same real world entity Such as the extraction of president Bush, George W. Bush, and Bush as different objects in a political news video text. This situation could be avoided by the identification of coreference chains in the texts. Thereby, user queries to retrieve the same object in the videos with different labels are processed more effectively without manual intervention. 9

10 Coreferential Chains in Turkish Political News Texts (from 10

11 System Overview [1] The Extraction of Salient Entities from Turkish Political News Texts [1] Sets of lexical resources Political Status (P) Continent and Country Names (C) City and Town Names (T) Well-known Institutions in Turkey (W) Turkish Proper Person Names (N) 11

12 System Overview [2] The Extraction of Salient Entities from Turkish Political News Texts [2] The salient entities are extracted by matching them against the pattern given as a regular expression where M 1 is (GEN U ε), M 2 is (POSS U ε), C ε, T ε, W ε, and P ε denote (C U ε), (T U ε), (W U ε), and (W U ε), respectively. 12

13 System Overview [3] The Extraction of Salient Entities from Turkish Political News Texts [3] 13

14 System Overview [4] Identification of Coreferential Chains A heuristic-based coreference resolution scheme is used. Each entity in the list of salient objects is compared to the previously extracted entities in turn to check whether their tokens intersect or not. If at least one of the nominal forms of any tokens in the entities compared match exactly, then they are said to be intersected. The comparison procedure ends when such an intersection is found and a coreference link is formed between the entity under consideration and the intersecting entity. 14

15 Evaluation and Discussion [1] The evaluation is performed on samples from the METU Turkish Corpus (Say et al., 2002). Evaluation samples are manually annotated with SGML COREF tag using an annotation rule used for this purpose. Evaluation is performed by comparing the output of the system against the manually annotated text. 15

16 Evaluation and Discussion [2] 16

17 Evaluation and Discussion [3] 17

18 Evaluation and Discussion [4] For the second phase precision values are lower than recall values. The system is good at covering the coreference links yet it also outputs several incorrect links. Most of the incorrectly annotated coreference links turn out to be identity-of-sense anaphora The referring expressions do not refer to the same realworld entity although they refer to each other. The results of the first phase is lower than that of the second phase. Main reason is the absence of some of the required information in the lexical database. 18

19 Conclusion [1] A text-based approach for semantic annotation of videos is presented. The approach makes use of the video texts to extract semantic entities from videos. The salient semantic entities are extracted using lexical resources. The coreferential links between the entities are identified In order to avoid superfluous extraction of the same underlying entities. 19

20 Conclusion [2] The approach is implemented as a semantic object extraction system. Important cues such as capitalizations and punctuation marks are not utilized. Such information is usually not available in the transcription texts of news videos. Its performance is evaluated on Turkish political news texts from METU Turkish Corpus. As a first attempt, the evaluation results are promising, however, a number of cases turn out to need further attention. 20

21 References R. Grishman, Information extraction, in The Oxford Handbook of Computational Linguistics, R. Mitkov, Ed. Oxford Univ. Press, 2003, ch. 30. R. Mitkov, Anaphora Resolution, 1st ed. Longman, D. Küçük, N. B. Özgür, A. Yazıcı, and M. Koyuncu, A fuzzy conceptual model for multimedia data with application to news video domain, in Proc. of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), B. Say, D. Zeyrek, K. Oflazer, and U. Özge, Development of a corpus and a treebank for present-day written Turkish, in Proc. of the 11th International Conference of Turkish Linguistics,

22 Thank You 22

Question Answering Using XML-Tagged Documents

Question Answering Using XML-Tagged Documents Question Answering Using XML-Tagged Documents Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/trec11/index.html XML QA System P Full text processing of TREC top 20 documents Sentence

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

2006 Guidelines for Annotation of Within-document NP Coreference (L.Hasler in discussion with K. Naumann and C. Orasan:

2006 Guidelines for Annotation of Within-document NP Coreference (L.Hasler in discussion with K. Naumann and C. Orasan: 2006 Guidelines for Annotation of Within-document NP Coreference (L.Hasler in discussion with K. Naumann and C. Orasan: 30.01.2006) General Strategy Prior to annotation, read the whole text to familiarise

More information

Enterprise Multimedia Integration and Search

Enterprise Multimedia Integration and Search Enterprise Multimedia Integration and Search José-Manuel López-Cobo 1 and Katharina Siorpaes 1,2 1 playence, Austria, 2 STI Innsbruck, University of Innsbruck, Austria {ozelin.lopez, katharina.siorpaes}@playence.com

More information

PUBLICATIONS (SCI-E/SSCI) A.Yazici and M.Koyuncu, Fuzzy Object-Oriented Database Modeling with 1

PUBLICATIONS (SCI-E/SSCI) A.Yazici and M.Koyuncu, Fuzzy Object-Oriented Database Modeling with 1 Murat Koyuncu, Ph.D. Professor of Computer Science Atılım University Department of Information Systems 060 İncek, Gölbaşı, Ankara/TURKEY murat.koyuncu@atilim.edu.tr Tel: +90 56 4 EDUCATION 996-00 Middle

More information

The BilVideo video database management system

The BilVideo video database management system Editor: Tiziana Catarci University of Rome BilVideo: A Video Database Management System Mehmet Emin Dönderler, Ediz Şaykol, Özgür Ulusoy, and Uğur Güdükbay Bilkent University, Ankara, Turkey Figure 1.

More information

English Understanding: From Annotations to AMRs

English Understanding: From Annotations to AMRs English Understanding: From Annotations to AMRs Nathan Schneider August 28, 2012 :: ISI NLP Group :: Summer Internship Project Presentation 1 Current state of the art: syntax-based MT Hierarchical/syntactic

More information

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally

More information

Influence of Text Type and Text Length on Anaphoric Annotation

Influence of Text Type and Text Length on Anaphoric Annotation Influence of Text Type and Text Length on Anaphoric Annotation Daniela Goecke 1, Maik Stührenberg 1, Andreas Witt 2 Universität Bielefeld 1, Universität Tübingen 2 Fakultät für Linguistik und Literaturwissenschaft,

More information

Multi-level XML-based Corpus Annotation

Multi-level XML-based Corpus Annotation Multi-level XML-based Corpus Annotation Harris Papageorgiou, Prokopis Prokopidis, Voula Giouli, Iason Demiros, Alexis Konstantinidis, Stelios Piperidis Institute for Language and Speech Processing Epidavrou

More information

A Short Introduction to CATMA

A Short Introduction to CATMA A Short Introduction to CATMA Outline: I. Getting Started II. Analyzing Texts - Search Queries in CATMA III. Annotating Texts (collaboratively) with CATMA IV. Further Search Queries: Analyze Your Annotations

More information

NPs for Events: Experiments in Coreference Annotation

NPs for Events: Experiments in Coreference Annotation NPs for Events: Experiments in Coreference Annotation Laura Hasler *, Constantin Orasan *, Karin Naumann * Research Group in Computational Linguistics, SHLSS, University of Wolverhampton Stafford Street,

More information

ACE 2008: Cross-Document Annotation Guidelines (XDOC)

ACE 2008: Cross-Document Annotation Guidelines (XDOC) ACE 2008: Cross-Document Annotation Guidelines (XDOC) Version 1.6 Linguistic Data Consortium http://projects.ldc.upenn.edu/ace/ Overview The objective of the Automatic Content Extraction (ACE) series of

More information

Associating video frames with text

Associating video frames with text Associating video frames with text Pinar Duygulu and Howard Wactlar Informedia Project School of Computer Science University Informedia Digital Video Understanding Project IDVL interface returned for "El

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

Speech Recognition Systems for Automatic Transcription, Voice Command & Dialog applications. Frédéric Beaugendre

Speech Recognition Systems for Automatic Transcription, Voice Command & Dialog applications. Frédéric Beaugendre Speech Recognition Systems for Automatic Transcription, Voice Command & Dialog applications Frédéric Beaugendre www.seekiotech.com SeekioTech Start-up hosted at the multimedia incubator «la Belle de Mai»,

More information

Sustainability of Text-Technological Resources

Sustainability of Text-Technological Resources Sustainability of Text-Technological Resources Maik Stührenberg, Michael Beißwenger, Kai-Uwe Kühnberger, Harald Lüngen, Alexander Mehler, Dieter Metzing, Uwe Mönnich Research Group Text-Technological Overview

More information

Automatic Metadata Extraction for Archival Description and Access

Automatic Metadata Extraction for Archival Description and Access Automatic Metadata Extraction for Archival Description and Access WILLIAM UNDERWOOD Georgia Tech Research Institute Abstract: The objective of the research reported is this paper is to develop techniques

More information

TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store

TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store Roldano Cattoni 1, Francesco Corcoglioniti 1,2, Christian Girardi 1, Bernardo Magnini 1, Luciano Serafini

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

PRIS at TAC2012 KBP Track

PRIS at TAC2012 KBP Track PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and

More information

Adjudication of Coreference Annotations via Finding Optimal Repairs of Equivalence Relations

Adjudication of Coreference Annotations via Finding Optimal Repairs of Equivalence Relations Adjudication of Coreference Annotations via Finding Optimal Repairs of Equivalence Relations Peter Schüller Computer Engineering Department Faculty of Engineering Marmara University, Turkey peter.schuller@marmara.edu.tr

More information

The Turkish National Corpus (TNC): Comparing the Architectures of v1 and v2

The Turkish National Corpus (TNC): Comparing the Architectures of v1 and v2 The Turkish National Corpus (): Comparing the Architectures and Yeşim Aksan Selma Ayşe Özel Mersin University Mersin, Turkey yesimaksan@gmail.com Çukurova University Adana, Turkey saozel@gmail.com Hakan

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

A Content-Based Fuzzy Image Database Based on The Fuzzy ARTMAP Architecture

A Content-Based Fuzzy Image Database Based on The Fuzzy ARTMAP Architecture Turk J Elec Engin, VOL.13, NO.3 2005, c TÜBİTAK A Content-Based Fuzzy Image Database Based on The Fuzzy ARTMAP Architecture Mutlu UYSAL 1,FatoşTünay YARMAN VURAL 1 1 Middle-East Technical University, Ankara-TURKEY

More information

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Combining Neural Networks and Log-linear Models to Improve Relation Extraction Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation

More information

Semantic Video Indexing

Semantic Video Indexing Semantic Video Indexing T-61.6030 Multimedia Retrieval Stevan Keraudy stevan.keraudy@tkk.fi Helsinki University of Technology March 14, 2008 What is it? Query by keyword or tag is common Semantic Video

More information

CSC 5930/9010: Text Mining GATE Developer Overview

CSC 5930/9010: Text Mining GATE Developer Overview 1 CSC 5930/9010: Text Mining GATE Developer Overview Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 GATE Components 2 We will deal primarily with GATE Developer:

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

CIMWOS: A MULTIMEDIA ARCHIVING AND INDEXING SYSTEM

CIMWOS: A MULTIMEDIA ARCHIVING AND INDEXING SYSTEM CIMWOS: A MULTIMEDIA ARCHIVING AND INDEXING SYSTEM Nick Hatzigeorgiu, Nikolaos Sidiropoulos and Harris Papageorgiu Institute for Language and Speech Processing Epidavrou & Artemidos 6, 151 25 Maroussi,

More information

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News Mike Dowman* Valentin Tablan* Hamish Cunningham* Borislav Popov *Department of Computer Science, University of Sheffield

More information

CHAPTER 8 Multimedia Information Retrieval

CHAPTER 8 Multimedia Information Retrieval CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability

More information

IEEE Symposium: Computational Intelligence for Security and Defence Applications, Ottawa, Canada, July Copyright IEEE.

IEEE Symposium: Computational Intelligence for Security and Defence Applications, Ottawa, Canada, July Copyright IEEE. He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis Jana Diesner, Kathleen M. Carley Anaphora resolution

More information

Summary of Bird and Simons Best Practices

Summary of Bird and Simons Best Practices Summary of Bird and Simons Best Practices 6.1. CONTENT (1) COVERAGE Coverage addresses the comprehensiveness of the language documentation and the comprehensiveness of one s documentation of one s methodology.

More information

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1 N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, Lorenzo A. Rossi, C. Saraceno 2 Department of Electronics

More information

Contents. List of Figures. List of Tables. Acknowledgements

Contents. List of Figures. List of Tables. Acknowledgements Contents List of Figures List of Tables Acknowledgements xiii xv xvii 1 Introduction 1 1.1 Linguistic Data Analysis 3 1.1.1 What's data? 3 1.1.2 Forms of data 3 1.1.3 Collecting and analysing data 7 1.2

More information

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO

Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO Chrome based Keyword Visualizer (under sparse text constraint) SANGHO SUH MOONSHIK KANG HOONHEE CHO INDEX Proposal Recap Implementation Evaluation Future Works Proposal Recap Keyword Visualizer (chrome

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

TwNC: a Multifaceted Dutch News Corpus

TwNC: a Multifaceted Dutch News Corpus TwNC: a Multifaceted Dutch News Corpus Roeland Ordelman, Franciska de Jong, Arjan van Hessen, Hendri Hondorp. University of Twente (UT) Department of Electrical Engineering, Mathematics and Computer Science

More information

Automated Extraction of Event Details from Text Snippets

Automated Extraction of Event Details from Text Snippets Automated Extraction of Event Details from Text Snippets Kavi Goel, Pei-Chin Wang December 16, 2005 1 Introduction We receive emails about events all the time. A message will typically include the title

More information

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information

Ortolang Tools : MarsaTag

Ortolang Tools : MarsaTag Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements

More information

Introduction to Lexical Functional Grammar. Wellformedness conditions on f- structures. Constraints on f-structures

Introduction to Lexical Functional Grammar. Wellformedness conditions on f- structures. Constraints on f-structures Introduction to Lexical Functional Grammar Session 8 f(unctional)-structure & c-structure/f-structure Mapping II & Wrap-up Summary of last week s lecture LFG-specific grammar rules (i.e. PS-rules annotated

More information

WHAT YOU SEE IS (ALMOST) WHAT YOU HEAR: DESIGN PRINCIPLES FOR USER INTERFACES FOR ACCESSING SPEECH ARCHIVES

WHAT YOU SEE IS (ALMOST) WHAT YOU HEAR: DESIGN PRINCIPLES FOR USER INTERFACES FOR ACCESSING SPEECH ARCHIVES ISCA Archive http://www.isca-speech.org/archive 5 th International Conference on Spoken Language Processing (ICSLP 98) Sydney, Australia November 30 - December 4, 1998 WHAT YOU SEE IS (ALMOST) WHAT YOU

More information

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

HAPPY VALENTINE'S DAY 14 Feb 2011 JAIST

HAPPY VALENTINE'S DAY 14 Feb 2011 JAIST HAPPY VALENTINE'S DAY 14 Feb 2011 JAIST 1 BK TP.HCM Conceptual Graphs and Fuzzy Logic JAIST, 14 Feb 2011 Tru H. Cao Ho Chi Minh City University of Technology and John von Neumann Institute Outline Conceptual

More information

DBpedia Spotlight at the MSM2013 Challenge

DBpedia Spotlight at the MSM2013 Challenge DBpedia Spotlight at the MSM2013 Challenge Pablo N. Mendes 1, Dirk Weissenborn 2, and Chris Hokamp 3 1 Kno.e.sis Center, CSE Dept., Wright State University 2 Dept. of Comp. Sci., Dresden Univ. of Tech.

More information

Multi-modal Information Retrieval experiences from Context-Aware Image Management, CAIM

Multi-modal Information Retrieval experiences from Context-Aware Image Management, CAIM Multi-modal Information Retrieval experiences from Context-Aware Image Management, CAIM Joan Nordbotten Dept. Of Information and Media Science University of Bergen, Norway 1 Outline Multi-modal Information

More information

A Machine Learning Approach for Displaying Query Results in Search Engines

A Machine Learning Approach for Displaying Query Results in Search Engines A Machine Learning Approach for Displaying Query Results in Search Engines Tunga Güngör 1,2 1 Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey 2 Visiting Professor at

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

clarin:el an infrastructure for documenting, sharing and processing language data

clarin:el an infrastructure for documenting, sharing and processing language data clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use

More information

UIMA-based Annotation Type System for a Text Mining Architecture

UIMA-based Annotation Type System for a Text Mining Architecture UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and

More information

Building a Tokenizer for Indonesian

Building a Tokenizer for Indonesian Building a Tokenizer for Indonesian David Moeljadi and Hannah Choi Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore The 21st International Symposium on Malay/Indonesian

More information

An API for Discourse-level Access to XML-encoded Corpora

An API for Discourse-level Access to XML-encoded Corpora An API for Discourse-level Access to XML-encoded Corpora Christoph Müller, Michael Strube European Media Laboratory GmbH Villa Bosch Schloß-Wolfsbrunnenweg 33 69118 Heidelberg, Germany christoph.mueller,

More information

MMAXQL The MMAX2 Query Language Reference Manual (draft)

MMAXQL The MMAX2 Query Language Reference Manual (draft) MMAXQL The MMAX2 Query Language Reference Manual (draft) c Christoph Müller EML Research ggmbh http://mmax.eml-research.de 3rd February 2005 Contents 1 Introduction 3 1.1 About MMAXQL.......................................

More information

How Corpora with Annotated Coreference Links Improve Reference Resolution

How Corpora with Annotated Coreference Links Improve Reference Resolution How Corpora with Annotated Coreference Links Improve Reference Resolution Andrei Popescu-Belis Language and Cognition Group LIMSI CNRS BP 133, 91403 ORSAY Cedex, FRANCE popescu@limsi.fr Abstract This paper

More information

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task Lucas Sterckx, Thomas Demeester, Johannes Deleu, Chris Develder Ghent University - iminds Gaston Crommenlaan 8 Ghent, Belgium

More information

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News

Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News Mike Dowman* Valentin Tablan* Hamish Cunningham* Borislav Popov *Department of Computer Science, University of Sheffield

More information

Using the Web as a Corpus. in Natural Language Processing

Using the Web as a Corpus. in Natural Language Processing Using the Web as a Corpus in Natural Language Processing Malvina Nissim Laboratory for Applied Ontology ISTC-CNR, Roma nissim@loa-cnr.it Johan Bos Dipartimento di Informatica Università La Sapienza, Roma

More information

Visual Analysis of Documents with Semantic Graphs

Visual Analysis of Documents with Semantic Graphs Visual Analysis of Documents with Semantic Graphs Delia Rusu, Blaž Fortuna, Dunja Mladenić, Marko Grobelnik, Ruben Sipoš Department of Knowledge Technologies Jožef Stefan Institute, Ljubljana, Slovenia

More information

Information Extraction

Information Extraction Information Extraction A Survey Katharina Kaiser and Silvia Miksch Vienna University of Technology Institute of Software Technology & Interactive Systems Asgaard-TR-2005-6 May 2005 Authors: Katharina Kaiser

More information

Named Entity Detection and Entity Linking in the Context of Semantic Web

Named Entity Detection and Entity Linking in the Context of Semantic Web [1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge

More information

Learning Translation Templates with Type Constraints

Learning Translation Templates with Type Constraints Learning Translation Templates with Type Constraints Ilyas Cicekli Department of Computer Engineering, Bilkent University Bilkent 06800, Ankara, TURKEY ilyas@csbilkentedutr Abstract This paper presents

More information

Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013

Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013 Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing LARRY HECK, DILEK HAKKANI-TÜR, GOKHAN TUR Focus of This Paper SLU and Entity Extraction (Slot Filling) Spoken Language Understanding

More information

IBM Research Report. Automatic Search from Streaming Data

IBM Research Report. Automatic Search from Streaming Data RC23667 (W0507-127) July 14, 2005 Computer Science IBM Research Report Automatic Search from Streaming Data Anni R. Coden, Eric W. Brown IBM Research Division Thomas J. Watson Research Center P.O. Box

More information

Combining heterogeneous text-technological resources for anaphora resolution

Combining heterogeneous text-technological resources for anaphora resolution Combining heterogeneous text-technological resources for anaphora resolution Daniela Goecke Universität Bielefeld CoGETI Workshop Heidelberg, 24.11.2006 Overview 1. Projekt and Research Group 2. Application

More information

Noisy Text Clustering

Noisy Text Clustering R E S E A R C H R E P O R T Noisy Text Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-31 I D I A P December 2004 1 IDIAP, CP 592, 1920 Martigny, Switzerland, grangier@idiap.ch 2 IDIAP,

More information

Corpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002

Corpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002 Corpus methods for sociolinguistics Emily M. Bender bender@csli.stanford.edu NWAV 31 - October 10, 2002 Overview Introduction Corpora of interest Software for accessing and analyzing corpora (demo) Basic

More information

Split and Merge Based Story Segmentation in News Videos

Split and Merge Based Story Segmentation in News Videos Split and Merge Based Story Segmentation in News Videos Anuj Goyal, P. Punitha, Frank Hopfgartner, and Joemon M. Jose Department of Computing Science University of Glasgow Glasgow, United Kingdom {anuj,punitha,hopfgarf,jj}@dcs.gla.ac.uk

More information

Introduction to Fuzzy Databases

Introduction to Fuzzy Databases Introduction to Fuzzy Databases Adnan Yazici Dept. of Computer Engineering, Middle East Technical University, 06531, Ankara/Turkey Fuzzy Information in Databases Fuzzy information or fuzzy data can appear

More information

KAIFIA: Knowledge Assisted Intelligent Framework for Information Access

KAIFIA: Knowledge Assisted Intelligent Framework for Information Access KAIFIA: Knowledge Assisted Intelligent Framework for Information Access Chattun Lallah Intelligent Media Systems and Services The University of Reading http://www.imss.reading.ac.uk c.lallah@reading.ac.uk

More information

Utilizing Semantic Word Similarity Measures for Video Retrieval

Utilizing Semantic Word Similarity Measures for Video Retrieval Utilizing Semantic Word Similarity Measures for Video Retrieval Yusuf Aytar Computer Vision Lab, University of Central Florida yaytar@cs.ucf.edu Mubarak Shah Computer Vision Lab, University of Central

More information

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/dppdemo/index.html Dictionary Parsing Project Purpose: to

More information

Comp 336/436 - Markup Languages. Fall Semester Week 2. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 2. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2017 - Week 2 Dr Nick Hayward Digitisation - textual considerations comparable concerns with music in textual digitisation density of data is still a concern

More information

A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP

A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP Rini John and Sharvari S. Govilkar Department of Computer Engineering of PIIT Mumbai University, New Panvel, India ABSTRACT Webpages

More information

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational

More information

Minimal-Impact Personal Audio Archives

Minimal-Impact Personal Audio Archives Minimal-Impact Personal Audio Archives Dan Ellis, Keansub Lee, Jim Ogle Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

A Test Environment for Natural Language Understanding Systems

A Test Environment for Natural Language Understanding Systems A Test Environment for Natural Language Understanding Systems Li Li, Deborah A. Dahl, Lewis M. Norton, Marcia C. Linebarger, Dongdong Chen Unisys Corporation 2476 Swedesford Road Malvern, PA 19355, U.S.A.

More information

Integrated Querying of Images by Color, Shape, and Texture Content of Salient Objects

Integrated Querying of Images by Color, Shape, and Texture Content of Salient Objects Integrated Querying of Images by Color, Shape, and Texture Content of Salient Objects Ediz Şaykol, Uğur Güdükbay, and Özgür Ulusoy Department of Computer Engineering, Bilkent University 06800 Bilkent,

More information

MMAX2 Annotation Tool Quick Start Guide

MMAX2 Annotation Tool Quick Start Guide MMAX2 Annotation Tool Quick Start Guide c Christoph Müller EML Research ggmbh http://mmax.eml-research.de 1st February 2005 Contents 1 About this Document 3 2 Installing MMAX2 (Updated for version 1.0

More information

Text Recognition in Videos using a Recurrent Connectionist Approach

Text Recognition in Videos using a Recurrent Connectionist Approach Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using

More information

From Multimedia Retrieval to Knowledge Management. Pedro J. Moreno JM Van Thong Beth Logan

From Multimedia Retrieval to Knowledge Management. Pedro J. Moreno JM Van Thong Beth Logan From Multimedia Retrieval to Knowledge Management Pedro J. Moreno JM Van Thong Beth Logan CRL 2002/02 March 2002 From Multimedia Retrieval to Knowledge Management Pedro J. Moreno JM Van Thong Beth Logan

More information

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown Semantic and Multimodal Annotation CLARA University of Copenhagen 15-26 August 2011 Susan Windisch Brown 2 Program: Monday Big picture Coffee break Lexical ambiguity and word sense annotation Lunch break

More information

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

YUSUF AYTAR B.S. Ege University

YUSUF AYTAR B.S. Ege University SEMANTIC VIDEO RETRIEVAL USING HIGH LEVEL CONTEXT by YUSUF AYTAR B.S. Ege University A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Electrical

More information

Narrative Schema as World Knowledge for Coreference Resolution

Narrative Schema as World Knowledge for Coreference Resolution Narrative Schema as World Knowledge for Coreference Resolution Joseph Irwin Nara Institute of Science and Technology Nara Prefecture, Japan joseph-i@is.naist.jp Mamoru Komachi Nara Institute of Science

More information

A cocktail approach to the VideoCLEF 09 linking task

A cocktail approach to the VideoCLEF 09 linking task A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,

More information

Corpus Linguistics: corpus annotation

Corpus Linguistics: corpus annotation Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course

More information

CS 4201 Compilers 2014/2015 Handout: Lab 1

CS 4201 Compilers 2014/2015 Handout: Lab 1 CS 4201 Compilers 2014/2015 Handout: Lab 1 Lab Content: - What is compiler? - What is compilation? - Features of compiler - Compiler structure - Phases of compiler - Programs related to compilers - Some

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

Natural Language Processing. SoSe Question Answering

Natural Language Processing. SoSe Question Answering Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation

More information

ELEC 876: Software Reengineering

ELEC 876: Software Reengineering ELEC 876: Software Reengineering () Dr. Ying Zou Department of Electrical & Computer Engineering Queen s University Compiler and Interpreter Compiler Source Code Object Compile Execute Code Results data

More information

On Reduct Construction Algorithms

On Reduct Construction Algorithms 1 On Reduct Construction Algorithms Yiyu Yao 1, Yan Zhao 1 and Jue Wang 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao, yanzhao}@cs.uregina.ca 2 Laboratory

More information

EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet

EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet Hennie Brugman, Albert Russel, Daan Broeder, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500

More information