Text, Knowledge, and Information Extraction. Lizhen Qu

Text, Knowledge, and Information Extraction Lizhen Qu

A bit about Myself PhD: Databases and Information Systems Group (MPII) Advisors: Prof. Gerhard Weikum and Prof. Rainer Gemulla Thesis: Sentiment Analysis with Limited Training Data Now: machine learning group at NICTA, adjunct research fellow at ANU.

Macquarie 3

News about Macquarie Bank 4

Negative News about Macquarie Bank 5

Simple Math Problem Bob has 15 apples. He gives 9 to Sarah. How many apples does Bob have now? 6

Bob has 15 apples. He gives 9 to Sarah. How many apples does Bob have now? 7

Information Extraction Named entity recognition Named entity disambiguation Relation extraction 8

Knowledge Bases (Open Linked Data) (Bob_Dylan, compose, Like_a_rolling_stone ) (The_Dark_Night, directedby, Christopher_Nolan) Entity Graph OpenIE (Ollie, Reverb) Economic Graph 9

Knowledge Bases (Open Linked Data) YAGO #classes: 350,000 #entities: 10 million #facts: 120 million #language: 10 Entity Graph OpenIE (Ollie, Reverb) Economic Graph 10

Knowledge Bases (Open Linked Data) DBpedia #classes: 735 #entities: 38.3 million #triples: 6.9 billion #languages: 128 Entity Graph OpenIE (Ollie, Reverb) Economic Graph 11

Knowledge Bases (Open Linked Data) Freebase #entities: 50 million #facts: 3 billion #languages: almost 70 Entity Graph OpenIE (Ollie, Reverb) Economic Graph 12

Construct YAGO from (Semi) Structured Data 13

IE Challenge: ambiguity of Natural Language I made her duck. i. I cooked waterfowl for her. ii. I cooked waterfowl belonging her. iii. I created the duck she owns. iv. I caused her to quickly lower her head or body. v. I waved my magic wand and turned her into a waterfowl. 14

Named Entity Recognition TASK: ORG Research at Stanford led to a search engine company, founded by Page and Brin. PER PER Machine Learning Problem: O O ORG O O O O O O O O PER O PER O Research at Stanford led to search engine company, founded by Page and Brin. 15

Learning and Prediction has labels train models Sentences Feature Extraction Labeled Sentences no labels prediction 16

Feature Extraction Use features to represent each word. w -2 Research w -2 to w -1 at w -1 a Features of Stanford : w 0 w +1 Stanford led Features of Search : w 0 w +1 search engine w +2 to w +2 company POS noun POS noun capitalized? true capitalized? false Vectorise feature representations. w -2 = research capitalized w 0 = stanford w 0 = search 1 1 1 0 17

Standard Model: Conditional Random Fields Assigns local score to different (word, label) pairs. Joint inference to find best label sequences. CRF: p(y x) = exp P T t=1 Pi if i (y t 1,y t,x t ) Z Stanford NER [1]: 86% Best system [8]: 89% 18

Named Entity Disambiguation TASK: ORG Research at Stanford led to a search engine company, founded by Page and Brin. PER PER Larry Page Stanford Univeristy Sergey Brin 19

AIDA-light [2] 20

First Stage 21

Second Stage AIDA-light [2]: 84.8% DBPepdia spotlight: 75% 22

Relation Extraction Relation mention extraction. ORG: Stanford_University Research at Stanford led to a search engine company, founded by Page and Brin.? PER: Larry_Page PER: Sergey_Brin Expand knowledge bases. Larry Page? Stanford Univeristy The Dark Night? Christopher Nolan 23

Relation Mention Extraction Multi-class classification. Example features of a pair of entity mentions [3]. Research at Stanford led to a search engine company, founded by Page and Brin.? words between (Stanford, Page) Named entity types Number of mentions between (Stanford, Page) led, to, a, search, engine, company, founded, by (ORG, PER) 0 F-Measure on ACE: 71.2% [3] 24

Expand Knowledge Base Multi-instance, multi-label [4,5]. Distant supervision. Freebase relation-level label Larry Page Sergey Brin mention-level label? mention-level label? Research at Stanford led to a search engine company, founded by Page and Brin. Larry Page and Sergey Brin explained why they just created Alphabet. MAP [3] : 56% MAP [4] : 66% 25

Open Information Extraction Extract triples of any relations from the web [6]. It was exactly 50 years ago today that Bob Dylan walked into Studio A at Columbia Records in New York and recorded "Like a Rolling Stone. ( Bob Dylan, record, Like a rolling stone ) Optional: link triples to knowledge bases. ( Bob Dylan, record, Like a rolling stone ) The_Dark_Night record Like_a_Rolling _Stone F1 [6] : 19.6% F1 [9] : 28.3% 26

Harvest Domain-Specific Knowledge Deep learning. Learn cross-domain features. minimize training data. Transfer learning. source domain target domain newswire nurse handovers 27

Word Representation One-hot representation. stanford [ 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] university [ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] oxford [ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] conference [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ] talk [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 ] Distributed representation. stanford university oxford = [0.01, 0.3, -0.5, 0.6] conference talk 28

Distributed Representation 29

Apply Distributed Representations for NER Represent words based on positions rather than IDs. 2 nd word to the left first word to the left current word first word to the right 2 nd word to the right label Feature Matrix compare o stanford UNI university UNI and o oxford UNI 30

Results of Named Entity Recognition [7] Reduce the amount of training data. Tiny differences between word embeddings. 31

NER for Novel Named Entity Types Goals: Minimize labeled training data. Leverage existing resources: Labeled corpora. Unlabeled text. Existing knowledge bases. source domain target domain person doctor patient location country city hotel orgnization corporation 32

Experimental Results on I2B2 33

Learn Text Representations for Relations Unsupervised pre-training. Distant supervision. Freebase co-founders Larry Page Sergey Brin Inferred mention-level label Research at Stanford led to a search engine company, founded by Page and Brin. Inferred mention-level label Larry Page and Sergey Brin explained why they just created Alphabet. 34

NICTA Deep Learning for IE Toolkit A fully integrated deep learning toolkit for NLP. Pipelines include both NLP preprocessing and DL components. Written in Scala/Java. Easy to write new ML component. Reuse UIMA NLP components. Scalable. Easy switch between GPUs and CPUs. Learning on GPUs. Make use of UIMA for prediction. 35

References [1] Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005). [2] Nguyen, Dat Ba, et al. "Aida-light: High-throughput named-entity disambiguation." Linked Data on the Web at WWW2014 (2014). [3] Chan, Yee Seng, and Dan Roth. "Exploiting background knowledge for relation extraction." Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010. [4] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, Christopher D. Manning. Multi-instance Multi-label Learning for Relation Extraction. Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Natural Language Learning, 2012. [5] Riedel, Sebastian, et al. "Relation extraction with matrix factorization and universal schemas." (2013). [6] Schmitz, Michael, et al. "Open language learning for information extraction." Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012. [7] Qu, Lizhen, et al. "Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks." arxiv preprint arxiv: 1504.05319 (2015). [8] Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817 1853 [9] Angeli, Gabor, Melvin Johnson Premkumar, and Christopher D. Manning. "Leveraging Linguistic Structure For Open Domain Information Extraction." 36

Resources YAGO: http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/ research/yago-naga/yago DBPedia: http://wiki.dbpedia.org/ Alchemy : http://querybuilder.alchemyapi.com/builder Deep learning: http://www.deeplearning.net/ Word2vec : https://code.google.com/p/word2vec/ Mallet (Java): http://mallet.cs.umass.edu/ Factorie (Scala): http://factorie.cs.umass.edu/ Stanford CoreNLP: http://nlp.stanford.edu:8080/corenlp/ NLP conferences. ACL, EMNLP, COLING, NAACL, EACL NLP online courses. https://www.coursera.org/course/nlangp https://www.youtube.com/playlist?list=pl6397e4b26d00a269 ML online courses. https://www.coursera.org/course/ml https://www.coursera.org/course/neuralnets http://www.socher.org/index.php/deeplearningtutorial/deeplearningtutorial 37