Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos

Size: px

Start display at page:

Download "Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos"

Naomi Morrison
5 years ago
Views:

1 Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos Dilek Küçük 1 and Adnan Yazıcı 2 1 TÜBİTAK -UzayInstitute, Ankara -Turkey dilek.kucuk@uzay.tubitak.gov.tr 2 Dept. of Computer Eng., METU, Ankara - Turkey yazici@ceng.metu.edu.tr

2 Introduction Outline Information Extraction for Semantic Annotation of News Videos Coreferential Chains in Turkish Political News Texts System Overview Evaluation and Discussion Conclusion References 2

3 Introduction [1] The ever-increasing archives of broadcast news videos call for effective ways of querying them. In order to query the video data through high-level semantic entities such as objects, events, and relations, these entities should be properly extracted, and the corresponding video shots should be annotated accordingly. 3

4 Introduction [2] Information extraction (IE) techniques seem promising for object, relation, and event extraction from video texts. In the form of transcription texts obtained through automatic speech recognition (ASR) techniques or closed caption texts. IE is the extraction of useful semantic information such as objects, relations, and events from free natural language texts (Grishman, 2003). 4

5 Introduction [3] An important point to be considered by IE systems is the anaphora phenomenon in natural language texts. Anaphora is the situation where an entity points back to another entity in the text, where the pointing back entity is called an anaphor (Mitkov, 2002). If an anaphor and its antecedent refers to the same real world entity, then they are said to be coreferential hence this situation is called coreference (Mitkov, 2002). 5

6 Introduction [4] In this paper, we present an approach to extract objects for semantic annotation of news videos utilizing lexical resources. Coreferential chains are identified to prevent the extraction of the same entity multiple times with different surface forms. Yet, all surface forms in the chains are preserved for further utilization during prospective semantic query evaluation. 6

7 Information Extraction for Semantic Annotation of News Videos [1] The Fuzzy Conceptual Model for Multimedia Data presented in (Küçük et al., 2008). 7

8 Information Extraction for Semantic Annotation of News Videos [2] The proposed method aids in the automatic annotation of salient objects in video texts as follows: By utilizing a set of lexical resources, salient objects are obtained from the video texts. Similar to the named entity recognition task of IE. Turkish political news texts are selected as the application domain. The salient named entities in the domain is mostly political people. 8

9 Information Extraction for Semantic Annotation of News Videos [3] The extracted objects could be referring to the same real world entity Such as the extraction of president Bush, George W. Bush, and Bush as different objects in a political news video text. This situation could be avoided by the identification of coreference chains in the texts. Thereby, user queries to retrieve the same object in the videos with different labels are processed more effectively without manual intervention. 9

10 Coreferential Chains in Turkish Political News Texts (from 10

11 System Overview [1] The Extraction of Salient Entities from Turkish Political News Texts [1] Sets of lexical resources Political Status (P) Continent and Country Names (C) City and Town Names (T) Well-known Institutions in Turkey (W) Turkish Proper Person Names (N) 11

12 System Overview [2] The Extraction of Salient Entities from Turkish Political News Texts [2] The salient entities are extracted by matching them against the pattern given as a regular expression where M 1 is (GEN U ε), M 2 is (POSS U ε), C ε, T ε, W ε, and P ε denote (C U ε), (T U ε), (W U ε), and (W U ε), respectively. 12

13 System Overview [3] The Extraction of Salient Entities from Turkish Political News Texts [3] 13

14 System Overview [4] Identification of Coreferential Chains A heuristic-based coreference resolution scheme is used. Each entity in the list of salient objects is compared to the previously extracted entities in turn to check whether their tokens intersect or not. If at least one of the nominal forms of any tokens in the entities compared match exactly, then they are said to be intersected. The comparison procedure ends when such an intersection is found and a coreference link is formed between the entity under consideration and the intersecting entity. 14

15 Evaluation and Discussion [1] The evaluation is performed on samples from the METU Turkish Corpus (Say et al., 2002). Evaluation samples are manually annotated with SGML COREF tag using an annotation rule used for this purpose. Evaluation is performed by comparing the output of the system against the manually annotated text. 15

16 Evaluation and Discussion [2] 16

17 Evaluation and Discussion [3] 17

18 Evaluation and Discussion [4] For the second phase precision values are lower than recall values. The system is good at covering the coreference links yet it also outputs several incorrect links. Most of the incorrectly annotated coreference links turn out to be identity-of-sense anaphora The referring expressions do not refer to the same realworld entity although they refer to each other. The results of the first phase is lower than that of the second phase. Main reason is the absence of some of the required information in the lexical database. 18

19 Conclusion [1] A text-based approach for semantic annotation of videos is presented. The approach makes use of the video texts to extract semantic entities from videos. The salient semantic entities are extracted using lexical resources. The coreferential links between the entities are identified In order to avoid superfluous extraction of the same underlying entities. 19

20 Conclusion [2] The approach is implemented as a semantic object extraction system. Important cues such as capitalizations and punctuation marks are not utilized. Such information is usually not available in the transcription texts of news videos. Its performance is evaluated on Turkish political news texts from METU Turkish Corpus. As a first attempt, the evaluation results are promising, however, a number of cases turn out to need further attention. 20

21 References R. Grishman, Information extraction, in The Oxford Handbook of Computational Linguistics, R. Mitkov, Ed. Oxford Univ. Press, 2003, ch. 30. R. Mitkov, Anaphora Resolution, 1st ed. Longman, D. Küçük, N. B. Özgür, A. Yazıcı, and M. Koyuncu, A fuzzy conceptual model for multimedia data with application to news video domain, in Proc. of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), B. Say, D. Zeyrek, K. Oflazer, and U. Özge, Development of a corpus and a treebank for present-day written Turkish, in Proc. of the 11th International Conference of Turkish Linguistics,

22 Thank You 22

Question Answering Using XML-Tagged Documents

Question Answering Using XML-Tagged Documents Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/trec11/index.html XML QA System P Full text processing of TREC top 20 documents Sentence