YAGO: a Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames

Size: px

Start display at page:

Download "YAGO: a Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames"

Cornelius Lindsey
6 years ago
Views:

Joanna Biega 2, Thomas Rebele 1, Erdal Kuzey 2, Gerhard Weikum

1 1/38 YAGO: a Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames Fabian Suchanek 1, Johannes Hoffart 2, Joanna Biega 2, Thomas Rebele 1, Erdal Kuzey 2, Gerhard Weikum 2 1 Télécom ParisTech 2 Max Planck Institute for Informatics

2 Plan What is a knowledge base? YAGO Combining WordNet with Wikipedia Multilingual: facts and entities from 10 Wikipedias Time and space Accuracy Applications DBpedia IBM Watson AIDA Semantic Culturomics AMIE Partial completeness assumption Rule mining in knowledge bases Canonicalization of knowledge bases Wikilinks semantification Completeness Conclusion References 2/38

3 What is a knowledge base? 3/38 Mileva Marić married Albert Einstein Image source:

4 What is a knowledge base? 3/38 Mileva Marić married Albert Einstein has advisor Alfred Kleiner Image source:

5 What is a knowledge base? 3/38 Mileva Marić married Albert Einstein has advisor won prize Alfred Kleiner Nobel Prize in Physics Image source:

6 Where do knowledge bases help? 4/38 Applications of knowledge bases question answering semantic search intelligent personal assistants investigative journalism text analysis machine translation Hence, everybody does it Google: Knowledge Graph Amazon: Evi Microsoft: Satori

7 What is YAGO? 5/38 knowledge base with 10 million entities and>210 million facts multilingual facts from 10 languages focus on precision developed by Max-Planck Institute for Informatics and Télécom ParisTech

8 What is YAGO? 6/38 <Albert_Einstein> married advisor won Mileva Nobel Prize Alfred Kleiner

9 What does YAGO know? 7/38 theme name taxonomy-related facts simplified taxonomy main facts GeoNames facts meta-facts multilingual class labels maps to other knowledge bases raw information Wikipedia redirect links and anchor texts Themes of YAGO number of facts 95 m 17 m 55 m 39 m 203 m 787 k 4 m 296 m 471 m

10 What does YAGO know? - Sources 8/38 - Wikipedia - WordNet - GeoNames

11 YAGO 2006 idea: combine entities from Wikipedia with WordNet taxonomy 2007 YAGO: first version [SKW07] 2008 extract facts from infoboxes [SKW08] 2011 YAGO2: temporal and geographical meta facts [Hof+11a]; [Hof+13] 2013 YAGO2s: modular architecture [Suc+13] 2015 YAGO3: entities and facts from 10 languages [MBS15] Fabian Suchanek Johannes Hoffart Joanna Biega Gerhard Weikum Erdal Kuzey F. Mahdisoltani Gjergji Kasneci Thomas Rebele 9/38

12 YAGO - Combining WordNet with Wikipedia <wordnet_person>... rdfs:subclassof rdfs:subclassof <wordnet_physicist> rdfs:subclassof <wikicat_20th-century_physicists> Wordnet taxonomy Wikipedia categories rdf:type Albert Einstein 10/38

13 YAGO - Combining WordNet with Wikipedia Graph Browser w3id.org/yago/svgbrowser 11/38

14 YAGO - Multilingual: facts and entities from 10 Wikipedias 12/38

15 YAGO - Multilingual: facts and entities from 10 Wikipedias intermediate extractors clean facts deduplicate entities and facts check consistency 12/38

16 YAGO - Multilingual: facts and entities from 10 Wikipedias birth_place GEBURTSORT lieu de naissance (Einstein, Ulm) (Hawking, Oxford) (Euler, Basel)... (Einstein, Ulm) (Hawking, Oxford) (Euler, Basel)... (Einstein, Ulm) (Hawking, Oxford) (Euler, Basel)... 13/38

17 YAGO - Multilingual: facts and entities from 10 Wikipedias birth_place GEBURTSORT lieu de naissance (Einstein, Ulm) (Hawking, Oxford) (Euler, Basel)... (Einstein, Ulm) (Hawking, Oxford) (Euler, Basel)... (Einstein, Ulm) (Hawking, Oxford) (Euler, Basel)... P(S 1 S 2 ) > theta GEBURTSORT birth_place lieu de naissance birth_place... 13/38

18 YAGO - Multilingual: facts and entities from 10 Wikipedias YAGO architecture 14/38

19 YAGO - Time and space 15/38 Albert Einstein married Mileva Marić

20 YAGO - Time and space 15/38 Albert Einstein married Mileva Marić <id_tiwmcu_16x_11ovtp>

21 YAGO - Time and space Albert Einstein married Mileva Marić <id_tiwmcu_16x_11ovtp> source < 15/38

22 YAGO - Time and space Albert Einstein married Mileva Marić <id_tiwmcu_16x_11ovtp> source since until 1903-##-## 1919-##-## < 15/38

23 YAGO - Time and space 16/38 SPOTLX Browser w3id.org/yago/spotlx

24 YAGO - Accuracy 17/38 Screenshot of evaluation result 2 months evaluation, 15 participants evaluated 4412 facts of 76 relations (with 60m total facts) 98% facts of the sample were correct Wilson center: 95%, interval width: 4.2%

25 Applications What is a knowledge base? YAGO Combining WordNet with Wikipedia Multilingual: facts and entities from 10 Wikipedias Time and space Accuracy Applications DBpedia IBM Watson AIDA Semantic Culturomics AMIE Partial completeness assumption Rule mining in knowledge bases Canonicalization of knowledge bases Wikilinks semantification Completeness Conclusion References 18/38

26 Applications - DBpedia 19/38 research and community project knowledge base built from Wikipedia imported YAGO s taxonomy

27 Applications - IBM Watson 20/38 participated in Jeopardy (TV quizz show) first place (competitors: human champions) used YAGO s type hierarchy

28 Applications - AIDA 21/38 Named Entity Disambiguation with ADIA, [Hof+11b] github.com/yago-naga/aida

29 Applications - Semantic Culturomics in cooperation with Le Monde Average age of people by occupation, [HBS13] w3id.org/yago/mpi/le-monde 22/38

30 AMIE What is a knowledge base? YAGO Combining WordNet with Wikipedia Multilingual: facts and entities from 10 Wikipedias Time and space Accuracy Applications DBpedia IBM Watson AIDA Semantic Culturomics AMIE Partial completeness assumption Rule mining in knowledge bases Canonicalization of knowledge bases Wikilinks semantification Completeness Conclusion References 23/38

31 AMIE 24/38 AMIE type(x, pope) = diedin(x, Rome) Reference: [Gal+13] [Gal+15] Luis Galárraga Christina Teflioudi Katja Hose Fabian Suchanek

32 AMIE - Partial completeness assumption 25/38 iscitizenof...

33 AMIE - Partial completeness assumption 25/38 false iscitizenof...

34 AMIE - Partial completeness assumption 25/38 false unknown iscitizenof...

AMIE - Rule mining in knowledge bases almamater University of Zurich Albert Einstein worksat livedin livedin citizenof advises Alfred Kleiner

35 AMIE - Rule mining in knowledge bases almamater University of Zurich Albert Einstein worksat livedin livedin citizenof advises Alfred Kleiner Switzerland AMIE almamater(z,y), advises(x,z) livedin(x,y) = worksat(x,y) = iscitizenof(x,y) Diagram by Luis Galárraga Image source: 26/38

36 AMIE - Canonicalization of knowledge bases 27/38 graduaded from earned degree from University of Zurich Albert Einstein AMIE graduated from earned degree from Reference: [Gal+14] Diagram by Luis Galárraga Image source:

AMIE - Wikilinks semantification linksto won

Reference: [GSM15] = won(x, Nobel Prize)

37 AMIE - Wikilinks semantification linksto won Albert Einstein is Nobel Prize linksto??? is Scientist Max Planck AMIE is(x, Scientist), linksto(x, Nobel Prize) Reference: [GSM15] = won(x, Nobel Prize) Diagram by Luis Galárraga Image source: 28/38

38 AMIE - Completeness 29/38 Albert Einstein married Mileva Marić Image source:

39 AMIE - Completeness Albert Einstein married Mileva Marić married Priscilla Presley? AMIE can mine completeness rules Image source: 29/38

40 AMIE - Completeness 30/38 closed world assumption partial completeness assumption popularity oracle no-change oracle star-pattern oracle class oracle AMIE oracle Reference: [Gal+17] Luis Galárraga Simon Razniewski Antoine Amarilli Fabian Suchanek

oracle AMIE oracle morethan 1 (x, hasparent) = complete(x, hasparent)

41 AMIE - Completeness 30/38 closed world assumption partial completeness assumption popularity oracle no-change oracle star-pattern oracle class oracle AMIE oracle morethan 1 (x, hasparent) = complete(x, hasparent) Reference: [Gal+17] Luis Galárraga Simon Razniewski Antoine Amarilli Fabian Suchanek

42 AMIE - Completeness 31/38 AMIE can predict incompleteness bornin: 100% diedin: 96% directed: 100% graduatedfrom: 87% haschild: 78% ismarriedto: 46%... and more

43 Conclusion What is a knowledge base? YAGO Combining WordNet with Wikipedia Multilingual: facts and entities from 10 Wikipedias Time and space Accuracy Applications DBpedia IBM Watson AIDA Semantic Culturomics AMIE Partial completeness assumption Rule mining in knowledge bases Canonicalization of knowledge bases Wikilinks semantification Completeness Conclusion References 32/38

44 Conclusion 33/38 knowledge base with 10 million entities and>210 million facts combines Wikipedia with Wordnet multilingual facts from 10 languages has facts about time and space contains>95% correct facts available at

45 Conclusion 33/38 knowledge base with 10 million entities and>210 million facts combines Wikipedia with Wordnet multilingual facts from 10 languages has facts about time and space contains>95% correct facts available at Announce: in the following weeks new data release YAGO goes Open Source

46 References What is a knowledge base? YAGO Combining WordNet with Wikipedia Multilingual: facts and entities from 10 Wikipedias Time and space Accuracy Applications DBpedia IBM Watson AIDA Semantic Culturomics AMIE Partial completeness assumption Rule mining in knowledge bases Canonicalization of knowledge bases Wikilinks semantification Completeness Conclusion References 34/38

47 References 35/38 Joanna Biega, Erdal Kuzey, and Fabian M. Suchanek. Inside YAGO2s: A transparent information extraction architecture. In: Proceedings of the 22nd international conference on World Wide Web companion International World Wide Web Conferences Steering Committee, 2013, pp Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW Luis Galárraga, Geremy Heitz, Kevin Murphy, and Fabian M. Suchanek. Canonicalizing Open Knowledge Bases. In: ACM Press, 2014, pp ISBN: DOI: / Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian M. Suchanek. Fast Rule Mining in Ontological Knowledge Bases with AMIE+. In: VLDBJ

48 References 36/38 Luis Galárraga, Simon Razniewski, Antoine Amarilli, and Fabian M. Suchanek. Predicting Completeness in Knowledge Bases. In: ACM Press, 2017, pp ISBN: DOI: / Luis Galarraga, Danai Symeonidou, and Jean-Claude Moissinac. Rule Mining for Semantifying Wikilinks. In: Linked Open Data Workshop at WWW Thomas Huet, Joanna Biega, and Fabian M. Suchanek. Mining history with Le Monde. In: Proceedings of the 2013 workshop on Automated knowledge base construction ACM Press, 2013, pp ISBN: DOI: / Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Research Report. Max-Planck-Institut für Informatik, 2010

49 References 37/38 Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, Edwin Lewis-Kelham, Gerard De Melo, and Gerhard Weikum. YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th international conference companion on World wide web ACM, 2011, pp Johannes Hoffart et al. Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics, 2011, pp Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, and Gerhard Weikum. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. In: Artificial Intelligence 194 (2013) , pp Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. YAGO3: A knowledge base from multilingual Wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research CIDR 2015, 2015

50 References 38/38 Thomas Rebele, Fabian Suchanek, Johannes Hoffart, Joanna Biega, Erdal Kuzey, and Gerhard Weikum. YAGO: a multilingual knowledge base from Wikipedia, Wordnet, and Geonames. In: ISWC Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web ACM, 2007, pp Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A large ontology from wikipedia and wordnet. In: Web Semantics: Science, Services and Agents on the World Wide Web 6.3 (2008) , pp Fabian M. Suchanek, Johannes Hoffart, Erdal Kuzey, and Edwin Lewis-Kelham. YAGO2s: Modular High-Quality Information Extraction with an Application to Flight Planning.. In: BTW. Vol , pp Acknowledgements: Icons designed by Madebyoliver from Flaticon

YAGO: a Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames

YAGO: a Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames 1/24 a from Wikipedia, Wordnet, and Geonames 1, Fabian Suchanek 1, Johannes Hoffart 2, Joanna Biega 2, Erdal Kuzey 2, Gerhard Weikum 2 Who uses 1 Télécom ParisTech 2 Max Planck Institute for Informatics