Fractional Similarity : Cross-lingual Feature Selection for Search

Size: px

Start display at page:

Download "Fractional Similarity : Cross-lingual Feature Selection for Search"

Derek Gilbert
6 years ago
Views:

1 : Cross-lingual Feature Selection for Search Jagadeesh Jagarlamudi University of Maryland, College Park, USA Joint work with Paul N. Bennett Microsoft Research, Redmond, USA

2 Using All the Data Existing Search Engine Get Help Search Engine in a New language/market Training Data (Judgements, Clicks, etc.) Query and result documents are in the same language 2

3 Problem Statement Improve foreign language ranker using English Search Engine data Query and result documents are in the same language Different from CLIR Obtaining relevance judgments is expensive Potentially advantageous Increased training data Quality of behavioral data Trivial solution may not be optimal Amount of signal carried by features can be different Differences in languages/ regions 3

4 Outline Problem Statement Approaches and Sub-problems Challenges in capturing a feature similarity Our Approach Fractional Similarity Cross-lingual Feature Selection Experiments Conclusion 4

Approaches and sub-problems 1. Using the queries in both languages a) Linguistically non-ambiguous queries [Gao et al. 2008] E.g. Harry Potter b) Joint Relevance Estimation for bilingual queries [Gao et al.

5 Approaches and sub-problems 1. Using the queries in both languages a) Linguistically non-ambiguous queries [Gao et al. 2008] E.g. Harry Potter b) Joint Relevance Estimation for bilingual queries [Gao et al. 2009] c) Multi-PRF [Chinnakotla et al. ACL, 2010, SIGIR 2010] 2. Feature based Transfer Learning 17% of unique queries are translation Natural approach in Learning to Rank situations Query-document as a feature vector E.g., PageRank, QueryWordsInDoc, etc Amount of signal carried by a feature can vary based on language 5

6 Matching feature distributions E.g: Norm Estimate pdf. using Kernel-density estimation probability Normalized Norm Feature-value Quantify the similarity and when are they different 6

7 Challenges in capturing similarity Query-Document correlation Consider the feature NoQueryWordsInDocTitle (for Harry Potter ) Top-10 candidate result documents likely to take the same value (of 2) 50 queries with 20 results on average gives 1000 training instances Query-set variance Different type of queries based on the region / characteristics of language Can t capture the differences using significance tests Normalization of the feature Continuous and Discrete features 7

8 Outline Problem Statement Approaches and Sub-problems Challenges in capturing a feature similarity Our Approach Fractional Similarity Cross-lingual Feature Selection Experiments Conclusion 8

9 Our approach 1. Fractional Similarity Significance tests such as T-test may not be useful Due to query Set Variance Robust method to verify if two populations have same mean Making features comparable to each other 2. Cross-lingual Feature Selection Identify features that are similar across languages A direct application compares means of a feature across languages Use of log-likelihood at random points Compare distributions Query-document correlation 9

10 10

11 11

12 Sample ref 12

13 Sample Sample ref 13

14 Sample Sample ref 14

15 Sample Sample ref p- are computed using T-test 15

16 1 Sample Sample ref p- are computed using T-test 16

17 1 Sample Sample ref p ef p- are computed using T-test 17

18 1 Sample Sample ref p ef frac = p ef p- are computed using T-test 18

19 1 Sample Sample ref p ef frac = p ef p- are computed using T-test frac 19

20 1 Sample Sample ref Combined ( = 0) p ef frac = p ef p- are computed using T-test frac 20

21 1 Sample Sample ref Combined ( = 0) 0.2) p ef frac = p ef p- are computed using T-test frac 21

22 1 Sample Sample ref Combined ( = 0) 0.2) p ef frac = p ef p- are computed using T-test frac 22

23 1 Sample Sample ref Combined ( = 0) 0.2) 0.4) p ef frac = p ef p- are computed using T-test frac 23

24 1 Sample Sample ref Combined ( = 0) 0.2) 0.4) p ef frac = p ef p- are computed using T-test frac 24

25 1 Sample Sample ref Combined ( = 0) 0.2) 0.4) 0.6) p ef frac = p ef p- are computed using T-test frac 25

26 1 Sample Sample ref Combined ( = 0) 0.2) 0.4) 0.6) 0.8) p ef frac = p ef p- are computed using T-test frac 26

27 1 Sample Sample ref Combined ( = 0.2) 0.4) 0.6) 0.8) 1.0) p ef frac = p ef p- are computed using T-test frac 27

28 1 Sample Sample ref Combined ( = 0.2) 0.4) 0.6) 0.8) 1.0) p ef frac = p ef p- are computed using T-test As increases p ef decreases & frac (decreases) frac 28

$0) p- are computed using T-test Fractional Similarity p ef frac = p ef frac The maximum allowed$

29 1 Sample Sample ref Combined ( = 0.2) 0.4) 0.6) 0.8) 1.0) p- are computed using T-test Fractional Similarity p ef frac = p ef frac The maximum allowed such that, frac C (Binary Search for [0,1] ) C 29

30 Cross-lingual Feature Selection Direct application of Fractional Similarity will compare means Instead we want to compare pdfs Use of Log-likelihood We estimate a pdf from English Compute the likelihood of English and German Samples Enables comparison of pdfs Applicable to both discrete and continuous features Query-document correlation Query based sampling and Aggregate statistic of the query 30

31 Experiments Data sets Between English and German Queries and documents are sampled from a web search engine English 347 Common features Graded Human Relevance Judgments German # Queries 15K 7K # urls / query # Features LambdaMART for training a ranker Outperformed other approaches in Yahoo LETOR challenge 31

32 Using English data Adapt: Train ranker on English and Adapt to German Align: Simply train on both English and German data German Only Eng_adapt+German Δ over baseline Eng_align+German Δ over baseline Simply adding English training data of the common features performed better 32

33 Cross-lingual Feature Selection Rank all the common features based on similarity score Drop the data of mismatched features Add the filtered English data to German training data Baseline Use all the common features ( Align from prev.) 33

34 Discussion Using English training data is helpful Cross-lingual Feature Selection improves further Fractional Similarity identifies similar features better than KL Evidenced by the consistent improvement over KL For both the methods, the improvement drops at higher ranks Aggressive feature selection also hurts Removing ~ 25% gave best performance Theoretical arguments are outlined in the paper. 34

35 Conclusions For features with high variance Traditional significance tests are not useful Give almost zero p- Fractional Similarity overcomes this by using intra language variance Increased robustness Not limited to IR setting General to situations with correlated instances Applicable to both discrete and continuous features Appropriate selection of pdf. estimation technique 35

36 Thank You 36

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles