Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

Size: px

Start display at page:

Download "Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling"

Luke West
5 years ago
Views:

1 Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tie-Yan Liu* Carnegie Mellon University & Microsoft Research* 1

2 Document Understanding in Search Interaction-Based Ranking Models Bag-of-Words Word Matches Bag-of-Entities Entity Semantics Document 2

3 Document Understanding in Search Bag-of-Terms : Effective & Efficient Interaction-Based Ranking Models Bag-of-Words Word Matches Bag-of-Entities Entity Semantics Document 3

4 Document Understanding in Search Bag-of-Terms : Effective & Efficient Mostly Frequency Signals Interaction-Based Ranking Models Bag-of-Words Word Matches Bag-of-Entities Entity Semantics Document 4

5 Shallow Understanding of Bag-of-Terms Frequency!= Importance Rank of Title Entities in Their Wiki Pages Other 57% not the most frequent Top 3 Top 1 Top 2 5

6 Shallow Understanding of Bag-of-Terms Frequency!= Importance 6

7 Shallow Understanding of Bag-of-Terms Frequency!= Importance My name only appears once. 7

8 The Entity Salience Task Estimate entity importance in documents [Dunietz and Gillick 2014] 8

9 The Entity Salience Task Estimate entity importance in documents [Dunietz and Gillick 2014] Annotated NYT ~Half Million News Manual Summary 9

10 The Entity Salience Task Estimate entity importance in documents [Dunietz and Gillick 2014] Annotated NYT ~Half Million News Manual Summary Candidate Entities Entity Annotations ~200 per Article 10

11 The Entity Salience Task Estimate entity importance in documents [Dunietz and Gillick 2014] Annotated NYT ~Half Million News Manual Summary Candidate Entities Entity Annotations ~200 per Article Salient Labels Appearance in Summary ~28 per Article 11

12 This Work Kernel Entity Salience Estimation: Represent entities by Knowledge-Enriched Embeddings Model entity interactions by a Kernel Interaction Model Learn to estimate salience end-to-end Improve web search by domain adaption 12

13 Intuition From counting frequency to modeling interactions 13

14 Learn to represent entities using embeddings Integrate knowledge graph semantics KNOWLEDGE-ENRICHED EMBEDDING 14

15 Step 1: Knowledge-Enriched Embedding Map entities to embeddings (to be learned) Target Entity Concussion Embedding Layer e e $ e $ Entity Embedding 15

16 Step 1: Knowledge-Enriched Embedding Introduce words in the entity description Target Entity Concussion Concussion mild traumatic injury w ' w ( w ) w * Embedding Layer e Description Words e $ e $ w + w + Entity Embedding Word Embedding 16

17 Step 1: Knowledge-Enriched Embedding Compose words by CNN filters Target Entity Concussion Concussion mild traumatic injury w ' w ( w ) w * CNN Embedding Layer e Description Words e $ e $ w + w + C + = W / w +:+23 Entity Embedding Word Embedding CNN Filter 17

18 Step 1: Knowledge-Enriched Embedding Max-pool to description embedding Target Entity Concussion Concussion mild traumatic injury w ' w ( w ) w * CNN Embedding Layer Max Pooling e v 5 Description Words e $ e $ w + w + C + = W / w +:+23 v 5 = max(c ',, C *<3 ) Entity Embedding Word Embedding CNN Filter Description Embedding 18

19 Step 1: Knowledge-Enriched Embedding Combine to the Knowledge Enriched Embedding (KEE) Target Entity Concussion Concussion mild traumatic injury w ' w ( w ) w * CNN Embedding Layer Max Pooling e v 5 Knowledge-Enriched Embedding (KEE) W + v > Description Words e $ e $ w + w + C + = W / w +:+23 v 5 = max(c ',, C *<3 ) v >@ = W A (e $ v 5 ) Entity Embedding Word Embedding CNN Filter Description Embedding KEE Embedding 19

20 Step 1: Knowledge-Enriched Embedding Combine to the Knowledge Enriched Embedding (KEE) Data-Driven Embeddings Target Entity Concussion Concussion mild traumatic injury w ' w ( w ) w * CNN Embedding Layer Max Pooling e v 5 Knowledge-Enriched Embedding (KEE) W + v > Knowledge Graph Semantics Description Words e $ e $ w + w + C + = W / w +:+23 v 5 = max(c ',, C *<3 ) v >@ = W A (e $ v 5 ) Entity Embedding Word Embedding CNN Filter Description Embedding KEE Embedding 20

21 Model term interactions in the embedding space Capture multi-level interactions using kernels KERNEL INTERACTION MODEL 21

22 Step 2: Kernel Interaction Model Model entity interactions in the embedding space Embedding of Document Entities v >@ Embedding of Target Entity v >B v >C v >D v >E Cosine Similarity 22

23 Step 2: Kernel Interaction Model Use kernels to capture multi-level interaction [Xiong et al. 2017] (K-NRM) Embedding of Document Entities Embedding of Target Entity v >@ v >B v >C v >D v >E Cosine Similarity RBF Kernels Φ(e $, E) Entity Kernels Similar!= Related Multi-Level Interactions Let the data decide φ I e $, e J = exp ( MNO > P <Q R C <(S R C ) Φ I e $, e J = {φ ' e $, e J,, φ U (e $, e J )} 23

24 Step 2: Kernel Interaction Model Multi-level interactions as votes to the target entity Embedding of Document Entities Embedding of Target Entity v >@ v >B v >C v >D v >E Cosine Similarity RBF Kernels Φ(e $, E) Entity Kernels φ I e $, e J = exp ( MNO > P <Q R C <(S R C ) Φ I e $, e J = {φ ' e $, e J,, φ U (e $, e J )} Φ e $, E = Φ I (e $, e J ) J 24

25 Step 2: Kernel Interaction Model Model word-entity interactions as well Embedding of Document Entities Embedding of Target Entity v >@ Embeddings of Document Words v >B v >C v >D v >E Cosine Similarity RBF Kernels Φ(e $, E) 25 Entity Kernels Word Kernels w ' w ( w ) w Z Φ(e $, W) φ I e $, e J = exp ( MNO > > P <Q R <(SC ) R Φ I e $, e J = {φ ' e $, e J,, φ U (e $, e J )} Φ e $, E = Φ I (e $, e J ) J KIM e $, d = Φ(e $, E) Φ(e $, W)

26 Step 2: Kernel Interaction Model Kernel scores as features for downstream tasks Embedding of Document Entities Embedding of Target Entity v >@ Embeddings of Document Words v >B v >C v >D v >E Cosine Similarity RBF Kernels Φ(e $, E) Entity Kernels Word Kernels w ' w ( w ) w Z Φ(e $, W) φ I e $, e J = exp ( MNO > > P <Q R <(SC ) R Φ I e $, e J = {φ ' e $, e J,, φ U (e $, e J )} Φ e $, E = Φ I (e $, e J ) J KIM e $, d = Φ(e $, E) Φ(e $, W) Multi-level votes from other entities Multi-level votes from other words 26

27 Step 3: End-to-End Learning for Salience Combine word and entity kernels to the salience score f e $, d = W a {Φ e $, E Φ e $, W } + b a Entity Votes Word Votes 27

28 Step 3: End-to-End Learning for Salience Combine word and entity kernels to the salience score f e $, d = W a {Φ e $, E Φ e $, W } + b a Pairwise learning to rank with hinge loss d max (0, 1 f e 2, d + f(e <, d)) > h,> i k Salient Entity > Others 28

29 Step 3: End-to-End Learning for Salience Combine word and entity kernels to the salience score f e $, d = W a {Φ e $, E Φ e $, W } + b a Pairwise learning to rank with hinge loss d max (0, 1 f e 2, d + f(e <, d)) > h,> i k Learn end-to-end: 29

30 Step 3: End-to-End Learning for Salience Combine word and entity kernels to the salience score Pairwise learning to rank with hinge loss Learn end-to-end: f e $, d = W a {Φ e $, E Φ e $, W } + b a d max (0, 1 f e 2, d + f(e <, d)) > h,> i k Allocate the embeddings space by kernels

31 Can we do better than counting frequency? SALIENCE EXPERIMENTS 31

32 Salience Estimation Performance Freq LTR EMB KESM 0.2 Freq LTR EMB KESM Freq: Frequency Count. EMB: Raw embeddings. LTR: Feature-based learning to rank. KESM: Kernel Entity Salience Model. 32

33 Salience Estimation Performance 0.6 Frequency is a strong indicator 0.35of importance Precision@ Precision@ Freq LTR EMB KESM 0.2 Freq LTR EMB KESM Freq: Frequency Count. EMB: Raw embeddings. LTR: Feature-based learning to rank. KESM: Kernel Entity Salience Model. 33

34 Salience Estimation Performance % with linguistic and 0.35 semantic features Freq LTR EMB KESM 0.2 Freq LTR EMB KESM Freq: Frequency Count. EMB: Raw embeddings. LTR: Feature-based learning to rank. KESM: Kernel Entity Salience Model. 34

35 Salience Estimation Performance 0.6 Without kernels, no gains 0.35 from embeddings Freq LTR EMB KESM 0.2 Freq LTR EMB KESM Freq: Frequency Count. EMB: Raw embeddings. LTR: Feature-based learning to rank. KESM: Kernel Entity Salience Model. 35

Salience Estimation Performance 0.6 +25%~30% with our 0.35 Kernel Salience Model Precision@5 0.5 0.4 0.3 Precision@5 0.3 0.25 0.

36 Salience Estimation Performance %~30% with our 0.35 Kernel Salience Model Freq LTR EMB KESM 0.2 Freq LTR EMB KESM Freq: Frequency Count. EMB: Raw embeddings. LTR: Feature-based learning to rank. KESM: Kernel Entity Salience Model. 36

37 How to improve search using entity salience AD HOC SEARCH TASK 37

38 Step 4: Adapt to Web Search Ranking by the salience of query entities in the document f q, d = W m d log Φ e $, E Φ e $, W E k r 38

39 Step 4: Adapt to Web Search Ranking by the salience of query entities in the document f q, d = W m d log Φ e $, E Φ e $, W E k r The interactions between query entities and document terms 39

40 Step 4: Adapt to Web Search Ranking by the salience of query entities in the document f q, d = W m d log Φ e $, E Φ e $, W E k r Pre-train by the NYT salience labels Serve as ranking features Use in standard learning to rank 40

41 Step 4: Adapt to Web Search Ranking by the salience of query entities in the document f q, d = W m d log Φ e $, E Φ e $, W E k r Pre-train by the NYT salience labels Serve as ranking features Use in standard learning to rank Generalize 41

42 Ranking Performance on TREC Word Match LTR ESR Conv-KNRM KERM ESR: Entity frequencies and knowledge graph embedding. [WWW 2017] Conv-KNRM: N-gram soft matches pre-trained on Bing search log. [WSDM 2018] KERM: Entity salience pre-trained on NYT salience. (This Work) Results on ClueWeb09-B. Similar results on ClueWeb12-B13. 42

43 Ranking Performance on TREC Word Match +6% Entity Frequency LTR ESR Conv-KNRM KERM ESR: Entity frequencies and knowledge graph embedding. [WWW 2017] Conv-KNRM: N-gram soft matches pre-trained on Bing search log. [WSDM 2018] KERM: Entity salience pre-trained on NYT salience. (This Work) Results on ClueWeb09-B. Similar results on ClueWeb12-B13. 43

44 Ranking Performance on TREC % +3% Word Match Entity Frequency User Clicks LTR ESR Conv-KNRM KERM ESR: Entity frequencies and knowledge graph embedding. [WWW 2017] Conv-KNRM: N-gram soft matches pre-trained on Bing search log. [WSDM 2018] KERM: Entity salience pre-trained on NYT salience. (This Work) Results on ClueWeb09-B. Similar results on ClueWeb12-B13. 44

45 Ranking Performance on TREC Word Match +6% +3% Entity Frequency User Clicks +5% Entity Salience LTR ESR Conv-KNRM KERM ESR: Entity frequencies and knowledge graph embedding. [WWW 2017] Conv-KNRM: N-gram soft matches pre-trained on Bing search log. [WSDM 2018] KERM: Entity salience pre-trained on NYT salience. (This Work) Results on ClueWeb09-B. Similar results on ClueWeb12-B13. 45

46 Conclusion Understanding: From counting frequency to modeling interaction Knowledge-Enriched Embedding Kernel-Based Interaction Model 46

47 Conclusion Understanding: From counting frequency to modeling interaction Knowledge-Enriched Embedding Kernel-Based Interaction Model Data-Driven: Learn interaction patterns and embeddings end-to-end

48 Conclusion Understanding: From counting frequency to modeling interaction Knowledge-Enriched Embedding Kernel-Based Interaction Model Data-Driven: Learn interaction patterns and embeddings end-to-end 1 +1 Generalizable: Better salience estimation leads to better search Fine-Grained Text Processing Information Retrieval Systems 48

Conclusion Understanding: From counting frequency to modeling interaction Knowledge-Enriched Embedding Kernel-Based Interaction Model Data-Driven: Learn interaction

49 Conclusion Understanding: From counting frequency to modeling interaction Knowledge-Enriched Embedding Kernel-Based Interaction Model Data-Driven: Learn interaction patterns and embeddings end-to-end 1 +1 Generalizable: Better salience estimation leads to better search Fine-Grained Text Processing Information Retrieval Systems 49

50 Many thanks to my co-authors! Come to our KG4IR workshop! Codes and data will be available on my website. QUESTIONS? 50

The Entity Salience Task: Semantic Scholar Predicting which entities appear in paper title Salient entities should be mentioned in the title.

51 The Entity Salience Task: Semantic Scholar Predicting which entities appear in paper title Salient entities should be mentioned in the title. Document Paper abstract & title One million documents Candidate Entities Entity Annotations ~70 entities per abstract Salient Labels Those appear in title ~7 entities per paper 51

Entity and Knowledge Base-oriented Information Retrieval

Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061